Deep Learning

instructors: Xiaogang Wang. The Chinese University of Hong Kong - Spring 2015
intro: Homework, Homework Solutions, Lecture Notes, General Resources, Tutorial Notes, CUDA/GPU programming tutorial
homepage: https://piazza.com/cuhk.edu.hk/spring2015/eleg5040/resources

Self-Study Courses for Deep Learning (NVIDIA Deep Learning Institute)

homepage: https://developer.nvidia.com/deep-learning-courses

Introduction to Deep Learning

homepage: https://beta.bigdatauniversity.com/courses/introduction-deep-learning/

Deep Learning Courses

blog: http://machinelearningmastery.com/deep-learning-courses/

Creative Applications of Deep Learning w/ Tensorflow

homepage: https://www.kadenze.com/courses/creative-applications-of-deep-learning-with-tensorflow-i/info
github(ourse materials/Homework materials): https://github.com/pkmital/CADL

Deep Learning School: September 24-25, 2016 Stanford, CA

homepage: http://www.bayareadlschool.org/
day 1: https://www.youtube.com/watch?v=9dXiAecyJrY
day 2: https://www.youtube.com/watch?v=eyovmAtoUx0
github: https://github.com/lamblin/bayareadlschool
reddit: https://amp.reddit.com/r/MachineLearning/comments/54shmi/great_new_introductory_talks_on_various_subfields/
mirror: https://pan.baidu.com/s/1gfBe2fL

CSC 2541 Fall 2016: Differentiable Inference and Generative Models

homepage: http://www.cs.toronto.edu/~duvenaud/courses/csc2541/index.html

CS 294-131: Special Topics in Deep Learning (Fall, 2016)

https://berkeley-deep-learning.github.io/cs294-dl-f16/

Fork of Lempitsky DL for HSE master students.

github: https://github.com/yandexdataschool/HSE_deeplearning

ELEG 5040: Advanced Topics in Signal Processing (Introduction to Deep Learning)

resources: https://piazza.com/cuhk.edu.hk/spring2015/eleg5040/resources

CS 20SI: Tensorflow for Deep Learning Research

homepage: http://web.stanford.edu/class/cs20si/
github: https://github.com/chiphuyen/stanford-tensorflow-tutorials

Deep Learning with TensorFlow

https://bigdatauniversity.com/courses/deep-learning-tensorflow/

Deep Learning course

github: https://github.com/ddtm/dl-course

CSE 599G1: Deep Learning System

homepage: http://dlsys.cs.washington.edu/
assignments: http://dlsys.cs.washington.edu/assignments

CSC 321 Winter 2017: Intro to Neural Networks and Machine Learning

http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/

Theories of Deep Learning (STATS 385)

homepage: https://stats385.github.io/
video: https://www.researchgate.net/project/Theories-of-Deep-Learning
mirror: https://www.bilibili.com/video/av16136625/

CS230: Deep Learning Spring 2018

https://web.stanford.edu/class/cs230/

With Video Lectures

Deep Learning: Taking machine learning to the next level (Udacity)

instructor: Vincent Vanhoucke (Google), Arpan Chakraborty
homepage: https://www.udacity.com/course/deep-learning–ud730
homepage: https://cn.udacity.com/course/deep-learning–ud730/
homepage: https://classroom.udacity.com/courses/ud730/lessons/6370362152/concepts/63798118150923
assignments: https://github.com/tdhopper/udacity-deep-learning
ipn: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/1_notmnist.ipynb
ipn: http://nbviewer.jupyter.org/github/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/1_notmnist.ipynb
assignments: https://github.com/Arn-O/udacity-deep-learning

Neural networks class - Université de Sherbrooke

instructor: Hugo Larochelle
youtube: https://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
video: http://pan.baidu.com/s/1bnwEe8R
course content: http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html
google group: https://groups.google.com/forum/#!forum/neural-networks-online-course

Deep Learning: Theoretical Motivations

author: Yoshua Bengio
published: Sept. 13, 2015. (Deep Learning Summer School, Montreal 2015)
video: http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
blog: http://rinuboney.github.io/2015/10/18/theoretical-motivations-deep-learning.html

University of Waterloo: STAT 946 - Deep Learning

homepage: https://uwaterloo.ca/data-science/deep-learning
video+slides: http://pan.baidu.com/s/1sjTRgjN

Deep Learning (2016) - BME 595A, Eugenio Culurciello, Purdue University

course shedule: http://t.cn/RVYQa69?u=1402400261&m=4034720314226808&cu=2261580215&ru=1402400261&rm=4034708389597157
mirror: https://pan.baidu.com/s/1hsBJOpQ
video: https://www.youtube.com/playlist?list=PLNgy4gid0G9cbw5OjwG2jxvFqYDqkGnpJ
mirror: https://pan.baidu.com/s/1bpKb5Cj

UVA DEEP LEARNING COURSE

intro: MSc in Artificial Intelligence for the University of Amsterdam.
homepage: http://uvadlc.github.io/
assignments: https://github.com/uvadlc/uvadlc_practicals_2016

Practical Deep Learning For Coders, Part 1

intro: 10 hours a week for 7 weeks
homepage: http://course.fast.ai/
youtube: https://www.youtube.com/playlist?list=PLfYUBJiXbdtS2UQRzyrxmyVHoGW0gmLSM
mirror: https://pan.baidu.com/s/1eRLK742#list/path=%2F
github: https://github.com/fastai/courses
blog: http://www.kdnuggets.com/2016/12/deep-learning-coders-mooc-jeremy-howard.html

T81-558:Applications of Deep Neural Networks

intro: Washington University
course page: https://sites.wustl.edu/jeffheaton/t81-558/
youtube: https://www.youtube.com/playlist?list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN
github: https://github.com/jeffheaton/t81_558_deep_learning

CS294-129 Designing, Visualizing and Understanding Deep Neural Networks

MIT 6.S191: Introduction to Deep Learning

homepage: http://introtodeeplearning.com/index.html
schedule(Slides+Videos): http://introtodeeplearning.com/schedule.html
github: https://github.com/yala/introdeeplearning
youtube: https://www.youtube.com/playlist?list=PLkkuNyzb8LmxFutYuPA7B4oiMn6cjD6Rs
mirror: https://pan.baidu.com/s/1qXXDCoG#list/path=%2F

Edx: Deep Learning Explained

intro: Microsoft
course page: https://www.edx.org/course/deep-learning-explained-microsoft-dat236x

Computer Vision

Stanford CS231n: Convolutional Neural Networks for Visual Recognition (Spring 2017)

Stanford CS231n: Convolutional Neural Networks for Visual Recognition (Winter 2016)

homepage: http://cs231n.stanford.edu/
homepage: http://vision.stanford.edu/teaching/cs231n/index.html
syllabus: http://vision.stanford.edu/teaching/cs231n/syllabus.html
course notes: http://cs231n.github.io/
youtube: https://www.youtube.com/watch?v=NfnWJUyUJYU&feature=youtu.be
mirror: http://pan.baidu.com/s/1pKsTivp
mirror: http://pan.baidu.com/s/1c2wR8dy
assignment 1: http://cs231n.github.io/assignments2016/assignment1/
assignment 2: http://cs231n.github.io/assignments2016/assignment2/
assignment 3: http://cs231n.github.io/assignments2016/assignment3/

ITP-NYU - Spring 2016

Video lectures and course notes: http://ml4a.github.io/classes/itp-S16/

Deep Learning for Computer Vision Barcelona: Summer seminar UPC TelecomBCN (July 4-8, 2016)

intro: This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
homepage(slides+videos): http://imatge-upc.github.io/telecombcn-2016-dlcv/
homepage: https://imatge.upc.edu/web/teaching/deep-learning-computer-vision
youtube: https://www.youtube.com/user/imatgeupc/videos?shelf_id=0&sort=dd&view=0

DLCV - Deep Learning for Computer Vision

homepage: https://imatge.upc.edu/web/teaching/deep-learning-computer-vision

Advanced Computer Vision Cap6412

Natural Language Processing

CS224n: Natural Language Processing with Deep Learning

intro: This course is a merger of Stanford’s previous cs224n course and cs224d
homepage: http://web.stanford.edu/class/cs224n/

Course notes for CS224N Winter17

https://github.com/stanfordnlp/cs224n-winter17-notes

Stanford CS224d: Deep Learning for Natural Language Processing

homepage: http://cs224d.stanford.edu/
syllabus: http://cs224d.stanford.edu/syllabus.html
lecture notes: https://cs224d.stanford.edu/lecture_notes/

Code for Stanford CS224D: deep learning for natural language understanding

github: https://github.com/bogatyy/cs224d

CMU CS 11-747, Fall 2017: Neural Networks for NLP

intro: by Graham Neubig
course page: http://phontron.com/class/nn4nlp2017/
github: https://github.com/neubig/nn4nlp2017-code
video: https://www.bilibili.com/video/av14153689/

Deep Learning for NLP - Lecture October 2015

github: https://github.com/UKPLab/deeplearning4nlp-tutorial/tree/master/2015-10_Lecture

Harvard University: CS287: Natural Language Processing

http://cs287.fas.harvard.edu/

Deep Learning for Natural Language Processing: 2016-2017

intro: Oxford Deep NLP 2017 course
homepage: http://www.cs.ox.ac.uk/teaching/courses/2016-2017/dl/
github: https://github.com/oxford-cs-deepnlp-2017/lectures
youtube: https://www.youtube.com/playlist?list=PL613dYIGMXoZBtZhbyiBqb0QtgK6oJbpm
mirror: https://pan.baidu.com/s/1dFvGHUh#list/path=%2F
mirror: https://pan.baidu.com/s/1c2tcC96

GPU Programming

Course on CUDA Programming on NVIDIA GPUs, July 27–31, 2015

homepage: http://people.maths.ox.ac.uk/gilesm/cuda/

An Introduction to GPU Programming using Theano

youtube: https://www.youtube.com/watch?v=eVd2TqEkVp0
video: http://pan.baidu.com/s/1c1i6LtI#path=%252F

GPU Programming

homepage: http://courses.cms.caltech.edu/cs179/

Parallel Programming

Intro to Parallel Programming Using CUDA to Harness the Power of GPUs (Udacity)

https://www.udacity.com/course/intro-to-parallel-programming–cs344

Fundamentals of Accelerated Computing with CUDA C/C++

intro: Learn to use CUDA C/C++ tools and techniques to accelerate CPU-only applications to run on massively parallel GPUs.
homepage: https://courses.nvidia.com/courses/course-v1:DLI+C-AC-01+V1/about

Workshops

Deep Learning: Theory, Algorithms, and Applications

homepage: http://doc.ml.tu-berlin.de/dlworkshop2017/
video: https://www.youtube.com/playlist?list=PLJOzdkh8T5kqCNV_v1w2tapvtJDZYiohW
mirror: https://www.bilibili.com/video/av15565354/

Resources

Open Source Deep Learning Curriculum

http://www.deeplearningweekly.com/pages/open_source_deep_learning_curriculum

Published: 09 Oct 2015

Applications

Published: 09 Oct 2015

Acceleration and Model Compression

Papers

Published: 09 Oct 2015

Papers

Im2Text: Describing Images Using 1 Million Captioned Photographs

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

intro: Oral presentation at CVPR 2015. LRCN
project page: http://jeffdonahue.com/lrcn/
arxiv: http://arxiv.org/abs/1411.4389
github: https://github.com/BVLC/caffe/pull/2033

Show and Tell

Show and Tell: A Neural Image Caption Generator

intro: Google
arxiv: http://arxiv.org/abs/1411.4555
github: https://github.com/karpathy/neuraltalk
gitxiv: http://gitxiv.com/posts/7nofxjoYBXga5XjtL/show-and-tell-a-neural-image-caption-nic-generator
github: https://github.com/apple2373/chainer_caption_generation
github(TensorFlow): https://github.com/tensorflow/models/tree/master/im2txt
github(TensorFlow): https://github.com/zsdonghao/Image-Captioning

Image caption generation by CNN and LSTM

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

arxiv: http://arxiv.org/abs/1609.06647
github: https://github.com/tensorflow/models/tree/master/im2txt

Learning a Recurrent Visual Representation for Image Caption Generation

arxiv: http://arxiv.org/abs/1411.5654

Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation

intro: CVPR 2015
paper: http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

Deep Visual-Semantic Alignments for Generating Image Descriptions

intro: “propose a multimodal deep network that aligns various interesting regions of the image, represented using a CNN feature, with associated words. The learned correspondences are then used to train a bi-directional RNN. This model is able, not only to generate descriptions for images, but also to localize different segments of the sentence to their corresponding image regions.”
project page: http://cs.stanford.edu/people/karpathy/deepimagesent/
arxiv: http://arxiv.org/abs/1412.2306
slides: http://www.cs.toronto.edu/~vendrov/DeepVisualSemanticAlignments_Class_Presentation.pdf
github: https://github.com/karpathy/neuraltalk
demo: http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/

Deep Captioning with Multimodal Recurrent Neural Networks

intro: m-RNN. ICLR 2015
intro: “combines the functionalities of the CNN and RNN by introducing a new multimodal layer, after the embedding and recurrent layers of the RNN.”
homepage: http://www.stat.ucla.edu/~junhua.mao/m-RNN.html
arxiv: http://arxiv.org/abs/1412.6632
github: https://github.com/mjhucla/mRNN-CR
github: https://github.com/mjhucla/TF-mRNN

Show, Attend and Tell

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML 2015)

project page: http://kelvinxu.github.io/projects/capgen.html
arxiv: http://arxiv.org/abs/1502.03044
github: https://github.com/kelvinxu/arctic-captions
github: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow
github(TensorFlow): https://github.com/yunjey/show-attend-and-tell-tensorflow
demo: http://www.cs.toronto.edu/~rkiros/abstract_captions.html

Automatically describing historic photographs

website: https://staff.fnwi.uva.nl/d.elliott/loc/

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

arxiv: http://arxiv.org/abs/1504.06692
homepage: http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html
github: https://github.com/mjhucla/NVC-Dataset

What value do explicit high level concepts have in vision to language problems?

arxiv: http://arxiv.org/abs/1506.01144

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

arxiv: http://arxiv.org/abs/1506.06272

Learning FRAME Models Using CNN Filters for Knowledge Visualization (CVPR 2015)

project page: http://www.stat.ucla.edu/~yang.lu/project/deepFrame/main.html
arxiv: http://arxiv.org/abs/1509.08379
code+data: http://www.stat.ucla.edu/~yang.lu/project/deepFrame/doc/deepFRAME_1.1.zip

Generating Images from Captions with Attention

arxiv: http://arxiv.org/abs/1511.02793
github: https://github.com/emansim/text2image
demo: http://www.cs.toronto.edu/~emansim/cap2im.html

Order-Embeddings of Images and Language

arxiv: http://arxiv.org/abs/1511.06361
github: https://github.com/ivendrov/order-embedding

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

project page: http://cs.stanford.edu/people/karpathy/densecap/
arxiv: http://arxiv.org/abs/1511.07571
github(Torch): https://github.com/jcjohnson/densecap

Expressing an Image Stream with a Sequence of Natural Sentences

intro: NIPS 2015. CRCN
nips-page: http://papers.nips.cc/paper/5776-expressing-an-image-stream-with-a-sequence-of-natural-sentences
paper: http://papers.nips.cc/paper/5776-expressing-an-image-stream-with-a-sequence-of-natural-sentences.pdf
paper: http://www.cs.cmu.edu/~gunhee/publish/nips15_stream2text.pdf
author-page: http://www.cs.cmu.edu/~gunhee/
github: https://github.com/cesc-park/CRCN

Multimodal Pivots for Image Caption Translation

intro: ACL 2016
arxiv: http://arxiv.org/abs/1601.03916

Image Captioning with Deep Bidirectional LSTMs

intro: ACMMM 2016
arxiv: http://arxiv.org/abs/1604.00790
github(Caffe): https://github.com/deepsemantic/image_captioning
demo: https://youtu.be/a0bh9_2LE24

Encode, Review, and Decode: Reviewer Module for Caption Generation

Review Network for Caption Generation

intro: NIPS 2016
arxiv: https://arxiv.org/abs/1605.07912
github: https://github.com/kimiyoung/review_net

Attention Correctness in Neural Image Captioning

arxiv: http://arxiv.org/abs/1605.09553

Image Caption Generation with Text-Conditional Semantic Attention

arxiv: https://arxiv.org/abs/1606.04621
github: https://github.com/LuoweiZhou/e2e-gLSTM-sc

DeepDiary: Automatic Caption Generation for Lifelogging Image Streams

intro: ECCV International Workshop on Egocentric Perception, Interaction, and Computing
arxiv: http://arxiv.org/abs/1608.03819

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

intro: ACCV 2016
arxiv: http://arxiv.org/abs/1608.05813

Captioning Images with Diverse Objects

arxiv: http://arxiv.org/abs/1606.07770

Learning to generalize to new compositions in image understanding

arxiv: http://arxiv.org/abs/1608.07639

Generating captions without looking beyond objects

intro: ECCV2016 2nd Workshop on Storytelling with Images and Videos (VisStory)
arxiv: https://arxiv.org/abs/1610.03708

SPICE: Semantic Propositional Image Caption Evaluation

intro: ECCV 2016
project page: http://www.panderson.me/spice/
paper: http://www.panderson.me/images/SPICE.pdf
github: https://github.com/peteanderson80/SPICE

Boosting Image Captioning with Attributes

arxiv: https://arxiv.org/abs/1611.01646

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

arxiv: https://arxiv.org/abs/1611.05321

A Hierarchical Approach for Generating Descriptive Image Paragraphs

intro: Stanford University
arxiv: https://arxiv.org/abs/1611.06607

Dense Captioning with Joint Inference and Visual Context

intro: Snap Inc.
arxiv: https://arxiv.org/abs/1611.06949

Optimization of image description metrics using policy gradient methods

intro: University of Oxford & Google
arxiv: https://arxiv.org/abs/1612.00370

Areas of Attention for Image Captioning

arxiv: https://arxiv.org/abs/1612.01033

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

intro: CVPR 2017
arxiv: https://arxiv.org/abs/1612.01887
github: https://github.com/jiasenlu/AdaptiveAttention

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

arxiv: https://arxiv.org/abs/1612.04949

Recurrent Highway Networks with Language CNN for Image Captioning

arxiv: https://arxiv.org/abs/1612.07086

Top-down Visual Saliency Guided by Captions

arxiv: https://arxiv.org/abs/1612.07360
github: https://github.com/VisionLearningGroup/caption-guided-saliency

MAT: A Multimodal Attentive Translator for Image Captioning

https://arxiv.org/abs/1702.05658

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

intro: Snap Inc & Google Inc
arxiv: https://arxiv.org/abs/1704.03899

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

intro: CVPR 2017
arxiv: https://arxiv.org/abs/1704.06485
github: https://github.com/cesc-park/attend2u

Punny Captions: Witty Wordplay in Image Descriptions

https://arxiv.org/abs/1704.08224

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

https://arxiv.org/abs/1705.00930

Actor-Critic Sequence Training for Image Captioning

intro: Queen Mary University of London & Yang’s Accounting Consultancy Ltd
keywords: actor-critic reinforcement learning
arxiv: https://arxiv.org/abs/1706.09601

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?

intro: Proceedings of the 10th International Conference on Natural Language Generation (INLG’17)
arxiv: https://arxiv.org/abs/1708.02043

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

https://arxiv.org/abs/1709.03376

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

https://arxiv.org/abs/1709.05038

Contrastive Learning for Image Captioning

intro: NIPS 2017
arxiv: https://arxiv.org/abs/1710.02534

Phrase-based Image Captioning with Hierarchical LSTM Model

intro: ACCV2016 extension, phrase-based image captioning
arxiv: https://arxiv.org/abs/1711.05557

Convolutional Image Captioning

https://arxiv.org/abs/1711.09151

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

https://arxiv.org/abs/1712.02051

Improved Image Captioning with Adversarial Semantic Alignment

intro: IBM Research
arxiv: https://arxiv.org/abs/1805.00063

Object Counts! Bringing Explicit Detections Back into Image Captioning

intro: NAACL 2018
arxiv: https://arxiv.org/abs/1805.00314

Defoiling Foiled Image Captions

intro: NAACL 2018
arxiv: https://arxiv.org/abs/1805.06549

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1805.07030

Improving Image Captioning with Conditional Generative Adversarial Nets

https://arxiv.org/abs/1805.07112

CNN+CNN: Convolutional Decoders for Image Captioning

https://arxiv.org/abs/1805.09019

Diverse and Controllable Image Captioning with Part-of-Speech Guidance

https://arxiv.org/abs/1805.12589

Learning to Evaluate Image Captioning

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1806.06422

Topic-Guided Attention for Image Captioning

intro: ICIP 2018
arxiv: https://arxiv.org/abs/1807.03514

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

intro: ACM MM 2018 oral
arxiv: https://arxiv.org/abs/1808.05864
github: https://github.com/daqingliu/CAVP

Exploring Visual Relationship for Image Captioning

intro: ECCV 2018
arxiv: https://arxiv.org/abs/1809.07041

Boosted Attention: Leveraging Human Attention for Image Captioning

intro: ECCV 2018
arxiv: https://arxiv.org/abs/1904.00767

Image Captioning as Neural Machine Translation Task in SOCKEYE

https://arxiv.org/abs/1810.04101

Unsupervised Image Captioning

https://arxiv.org/abs/1811.10787

Attend More Times for Image Captioning

https://arxiv.org/abs/1812.03283

Object Descriptions

Generation and Comprehension of Unambiguous Object Descriptions

arxiv: https://arxiv.org/abs/1511.02283
github: https://github.com/mjhucla/Google_Refexp_toolbox

Video Captioning / Description

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework

intro: AAAI 2015
paper: http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Pan_Jointly_Modeling_Embedding_CVPR_2016_paper.pdf
paper: http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_AAAI2015_v2t.pdf

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

intro: NAACL-HLT 2015 camera ready
project page: https://www.cs.utexas.edu/~vsub/naacl15_project.html
arxiv: http://arxiv.org/abs/1412.4729
slides: https://www.cs.utexas.edu/~vsub/pdf/Translating_Videos_slides.pdf
code+data: https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

Describing Videos by Exploiting Temporal Structure

arxiv: http://arxiv.org/abs/1502.08029
github: https://github.com/yaoli/arctic-capgen-vid

SA-tensorflow: Soft attention mechanism for video caption generation

github: https://github.com/tsenghungchen/SA-tensorflow

Sequence to Sequence – Video to Text

intro: ICCV 2015. S2VT
project page: http://vsubhashini.github.io/s2vt.html
arxiv: http://arxiv.org/abs/1505.00487
slides: https://www.cs.utexas.edu/~vsub/pdf/S2VT_slides.pdf
github(Caffe): https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt
github(TensorFlow): https://github.com/jazzsaxmafia/video_to_sequence

Jointly Modeling Embedding and Translation to Bridge Video and Language

arxiv: http://arxiv.org/abs/1505.01861

Video Description using Bidirectional Recurrent Neural Networks

arxiv: http://arxiv.org/abs/1604.03390

Bidirectional Long-Short Term Memory for Video Description

arxiv: https://arxiv.org/abs/1606.04631

3 Ways to Subtitle and Caption Your Videos Automatically Using Artificial Intelligence

blog: http://photography.tutsplus.com/tutorials/3-ways-to-subtitle-and-caption-your-videos-automatically-using-artificial-intelligence–cms-26834

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

arxiv: http://arxiv.org/abs/1608.04959

Grounding and Generation of Natural Language Descriptions for Images and Videos

intro: Anna Rohrbach. Allen Institute for Artificial Intelligence (AI2)
youtube: https://www.youtube.com/watch?v=fE3FX8FowiU

Video Captioning and Retrieval Models with Semantic Attention

intro: Winner of three (fill-in-the-blank, multiple-choice test, and movie retrieval) out of four tasks of the LSMDC 2016 Challenge (Workshop in ECCV 2016)
arxiv: https://arxiv.org/abs/1610.02947

Spatio-Temporal Attention Models for Grounded Video Captioning

arxiv: https://arxiv.org/abs/1610.04997

Video and Language: Bridging Video and Language with Deep Learning

intro: ECCV-MM 2016. captioning, commenting, alignment
slides: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/Video-and-Language-ECCV-MM-2016-Tao-Mei-Pub.pdf

Recurrent Memory Addressing for describing videos

arxiv: https://arxiv.org/abs/1611.06492

Video Captioning with Transferred Semantic Attributes

arxiv: https://arxiv.org/abs/1611.07675

Adaptive Feature Abstraction for Translating Video to Language

arxiv: https://arxiv.org/abs/1611.07837

Semantic Compositional Networks for Visual Captioning

intro: CVPR 2017. Duke University & Tsinghua University & MSR
arxiv: https://arxiv.org/abs/1611.08002
github: https://github.com/zhegan27/SCN_for_video_captioning

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

arxiv: https://arxiv.org/abs/1611.09312

Attention-Based Multimodal Fusion for Video Description

arxiv: https://arxiv.org/abs/1701.03126

Weakly Supervised Dense Video Captioning

intro: CVPR 2017
arxiv: https://arxiv.org/abs/1704.01502

Generating Descriptions with Grounded and Co-Referenced People

intro: CVPR 2017. movie description
arxiv: https://arxiv.org/abs/1704.01518

Multi-Task Video Captioning with Video and Entailment Generation

intro: ACL 2017. UNC Chapel Hill
arxiv: https://arxiv.org/abs/1704.07489

Dense-Captioning Events in Videos

project page: http://cs.stanford.edu/people/ranjaykrishna/densevid/
arxiv: https://arxiv.org/abs/1705.00754

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

https://arxiv.org/abs/1706.01231

Reinforced Video Captioning with Entailment Rewards

intro: EMNLP 2017. UNC Chapel Hill
arxiv: https://arxiv.org/abs/1708.02300

End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering

intro: CVPR 2017. Winner of three (fill-in-the-blank, multiple-choice test, and movie retrieval) out of four tasks of the LSMDC 2016 Challenge
arxiv: https://arxiv.org/abs/1610.02947
slides: https://drive.google.com/file/d/0B9nOObAFqKC9aHl2VWJVNFp1bFk/view

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

https://arxiv.org/abs/1708.02478

Grounded Objects and Interactions for Video Captioning

https://arxiv.org/abs/1711.06354

Integrating both Visual and Audio Cues for Enhanced Video Caption

https://arxiv.org/abs/1711.08097

Video Captioning via Hierarchical Reinforcement Learning

https://arxiv.org/abs/1711.11135

Consensus-based Sequence Training for Video Captioning

https://arxiv.org/abs/1712.09532

Less Is More: Picking Informative Frames for Video Captioning

https://arxiv.org/abs/1803.01457

End-to-End Video Captioning with Multitask Reinforcement Learning

https://arxiv.org/abs/1803.07950

End-to-End Dense Video Captioning with Masked Transformer

intro: CVPR 2018. University of Michigan & Salesforce Research
arxiv: https://arxiv.org/abs/1804.00819

Reconstruction Network for Video Captioning

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1803.11438

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

intro: CVPR 2018 spotlight paper
arxiv: https://arxiv.org/abs/1804.00100

Jointly Localizing and Describing Events for Dense Video Captioning

intro: CVPR 2018 Spotlight, Rank 1 in ActivityNet Captions Challenge 2017
arxiv: https://arxiv.org/abs/1804.08274

Contextualize, Show and Tell: A Neural Visual Storyteller

https://arxiv.org/abs/1806.00738

RUC+CMU: System Report for Dense Captioning Events in Videos

intro: Winner in ActivityNet 2018 Dense Video Captioning challenge
arxiv: https://arxiv.org/abs/1806.08854

Streamlined Dense Video Captioning

intro: CVPR 2019
arxiv: https://arxiv.org/abs/1904.03870

Projects

Learning CNN-LSTM Architectures for Image Caption Generation: An implementation of CNN-LSTM image caption generator architecture that achieves close to state-of-the-art results on the MSCOCO dataset.

github: https://github.com/mosessoh/CNN-LSTM-Caption-Generator

screengrab-caption: an openframeworks app that live-captions your desktop screen with a neural net

intro: openframeworks app which grabs your desktop screen, then sends it to darknet for captioning. works great with video calls.
github: https://github.com/genekogan/screengrab-caption

Tools

CaptionBot (Microsoft)

website: https://www.captionbot.ai/

Blogs

Captioning Novel Objects in Images

http://bair.berkeley.edu/jacky/2017/08/08/novel-object-captioning/

Published: 09 Oct 2015

Deep Learning and Autonomous Driving

Courses

(Toronto) CSC2541: Visual Perception for Autonomous Driving, Winter 2016

homepage: http://www.cs.toronto.edu/~urtasun/courses/CSC2541/CSC2541_Winter16.html

(MIT) 6.S094: Deep Learning for Self-Driving Cars

homepage: http://selfdrivingcars.mit.edu/
github: https://github.com/lexfridman/deepcars
youtube: https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf
mirror: https://pan.baidu.com/s/1boLRFaB

How to Land An Autonomous Vehicle Job: Coursework

blog: https://medium.com/self-driving-cars/how-to-land-an-autonomous-vehicle-job-coursework-e7acc2bfe740#.7vfjx3i1j

Papers

An Empirical Evaluation of Deep Learning on Highway Driving

arxiv: http://arxiv.org/abs/1504.01716
github: https://github.com/brodyh/caffe

Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving

intro: NeurIPS 2018 Workshop on Machine Learning on the Phone and other Consumer Devices (MLPCD 2)
arxiv: https://arxiv.org/abs/1901.03912

Optical Flow augmented Semantic Segmentation networks for Automated Driving

intro: VISAPP 2019 Oral
arxiv: https://arxiv.org/abs/1901.07355

AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving

intro: Short Paper for a poster presentation at VISAPP 2019
arxiv: https://arxiv.org/abs/1901.05808

Design of Real-time Semantic Segmentation Decoder for Automated Driving

intro: VISAPP 2019
arxiv: https://arxiv.org/abs/1901.06580

Hierarchical Multi-task Deep Neural Network Architecture for End-to-End Driving

https://arxiv.org/abs/1902.03466

DeepDriving

DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

project page: http://deepdriving.cs.princeton.edu/
paper: http://deepdriving.cs.princeton.edu/paper.pdf
code: http://deepdriving.cs.princeton.edu/DeepDriving.zip

End to End Learning for Self-Driving Cars

intro: NVIDIA DevBox and Torch 7, 30 FPS
arxiv: http://arxiv.org/abs/1604.07316
blog: https://devblogs.nvidia.com/parallelforall/deep-learning-self-driving-cars/
demo: https://www.youtube.com/watch?v=NJU9ULQUwng&feature=youtu.be
github: https://github.com/SullyChen/Nvidia-Autopilot-TensorFlow

End-to-End Deep Learning for Self-Driving Cars

blog: https://devblogs.nvidia.com/parallelforall/deep-learning-self-driving-cars/

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

arxiv: http://arxiv.org/abs/1607.00971

BRAIN4CARS: Cabin Sensing for Safe and Personalized Driving

Brain4Cars: Sensory-Fusion Recurrent Neural Models for Driver Activity Anticipation

Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture

arxiv: http://arxiv.org/abs/1601.00740

Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models

arxiv: http://arxiv.org/abs/1504.02789
github: https://github.com/asheshjain399/ICCV2015_Brain4Cars

Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture

project page: http://www.brain4cars.com/
arxiv: http://arxiv.org/abs/1509.05016
github: https://github.com/asheshjain399/RNNexp

Long-term Planning by Short-term Prediction

arxiv: http://arxiv.org/abs/1602.01580

Learning a Driving Simulator

introo: by hacker Geohot
project page: http://research.comma.ai/
arxiv: http://arxiv.org/abs/1608.01230
paper: https://github.com/commaai/research/blob/master/paper/commalds.pdf
github: https://github.com/commaai/research

Comma.ai open-sources the data it used for its first successful driverless trips

blog: https://techcrunch.com/2016/08/03/comma-ai-open-sources-the-data-it-used-for-its-first-successful-driverless-trips/

Autonomous driving challenge: To Infer the property of a dynamic object based on its motion pattern using recurrent neural network

arxiv: http://arxiv.org/abs/1609.00361

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

arxiv: https://arxiv.org/abs/1610.03295

Learning from Maps: Visual Common Sense for Autonomous Driving

arxiv: https://arxiv.org/abs/1611.08583

SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks

intro: Accepted at the Deep Learning for Action and Interaction Workshop, 30th Conference on Neural Information Processing Systems (NIPS 2016)
arxiv: https://arxiv.org/abs/1611.08788

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

intro: first place on Kitti Road Segmentation. joint classification, detection and semantic segmentation via a unified architecture, less than 100 ms to perform all tasks
arxiv: https://arxiv.org/abs/1612.07695
github: https://github.com/MarvinTeichmann/MultiNet

Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

intro: UC Berkeley
arxiv: https://arxiv.org/abs/1703.10631

Virtual to Real Reinforcement Learning for Autonomous Driving

intro: Shanghai Jiao Tong University & UC Berkeley & Tsinghua University
arxiv: https://arxiv.org/abs/1704.03952

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

homepage: http://www.cvlibs.net/projects/autonomous_vision_survey/
arxiv: https://arxiv.org/abs/1704.05519

Deep Reinforcement Learning framework for Autonomous Driving

https://arxiv.org/abs/1704.02532

Systematic Testing of Convolutional Neural Networks for Autonomous Driving

https://arxiv.org/abs/1708.03309

MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving

https://arxiv.org/abs/1709.04821

CFENet: An Accurate and Efficient Single-Shot Object Detector for Autonomous Driving

intro: CVPR 2018 Workshop of Autonomous Driving (WAD)
arxiv: https://arxiv.org/abs/1806.09790

LaneNet: Real-Time Lane Detection Networks for Autonomous Driving

intro: Duke University & Horizon Robotics, Inc.
arxiv: https://arxiv.org/abs/1807.01726

Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision

https://arxiv.org/abs/1808.10393

Rethinking Self-driving: Multi-task Knowledge for Better Generalization and Accident Explanation Ability

intro: Waseda University
arxiv: https://arxiv.org/abs/1809.11100
demo: https://www.youtube.com/watch?v=N7ePnnZZwdE

Pixel and Feature Level Based Domain Adaption for Object Detection in Autonomous Driving

https://arxiv.org/abs/1810.00345

Multi-task Learning with Attention for End-to-end Autonomous Driving

intro: CVPR 2021 Workshop on Autonomous Driving
arxiv: https://arxiv.org/abs/2104.10753

Projects

Caffe-Autopilot: Car autopilot software that uses C++, BVLC Caffe, OpenCV, and SFML

github: https://github.com/SullyChen/Caffe-Autopilot

Self Driving Car Demo

intro; A project that trains a virtual car to how to move an object around a screen (drive itself) without running into obstacles using a type of reinforcement learning called Q-Learning
github: https://github.com/llSourcell/Self-Driving-Car-Demo/

Autoware: Open-source software for urban autonomous driving

github: https://github.com/CPFL/Autoware

Open Sourcing 223GB of Driving Data

Machine Learning for RC Cars

github: https://github.com/kendricktan/suiron

Self Driving (Toy) Ferrari

github: https://github.com/RyanZotti/Self-Driving-Car

Lane Finding Project for Self-Driving Car ND

github: https://github.com/udacity/CarND-LaneLines-P1

Instructions on how to get your development environment ready for Udacity Self Driving Car (SDC) Challenges

github: https://github.com/gtarobotics/self-driving-car

DeepDrive: self-driving car AI

intro: Caffe Model / Dataset / Tips and Tricks
homepage: http://deepdrive.io/

DeepDrive setup: Run a self-driving car simulator from the comfort of your own PC

github: https://github.com/crizCraig/deepdrive

DeepTesla: End-to-End Learning from Human and Autopilot Driving

http://selfdrivingcars.mit.edu/deeptesla/

DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

arxiv: https://arxiv.org/abs/1712.08644
github: https://github.com//heechul/picar

Autonomous Driving in Reality with Reinforcement Learning and Image Translation

intro: Shanghai Jiao Tong University
arxiv: https://arxiv.org/abs/1801.05299

End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception

https://arxiv.org/abs/1801.06734

Blogs

Self-driving cars: How far away are we REALLY from autonomous cars?(7 Aug 2015)

http://www.alphr.com/cars/1001329/self-driving-cars-how-far-away-are-we-really-from-autonomous-cars

Practice makes perfect: Driverless cars will learn from their mistakes(9 Oct 2015)

http://www.alphr.com/cars/1001713/practice-makes-perfect-driverless-cars-will-learn-from-their-mistakes

Eyes on the Road: How Autonomous Cars Understand What They’re Seeing

blog: http://blogs.nvidia.com/blog/2016/01/05/eyes-on-the-road-how-autonomous-cars-understand-what-theyre-seeing/

Human-in-the-loop deep learning will help drive autonomous cars

http://venturebeat.com/2016/06/25/human-in-the-loop-deep-learning-will-help-drive-autonomous-cars/

Using reinforcement learning in Python to teach a virtual car to avoid obstacles

Autonomous RC car using Raspberry Pi and Neural Networks

The Road Ahead: Autonomous Vehicles Startup Ecosystem

https://medium.com/the-mission/the-road-ahead-autonomous-vehicles-startup-ecosystem-3c91d546673d#.gft1xyh9l

Deep Driving - A revolutionary AI technique is about to transform the self-driving car

https://www.technologyreview.com/s/602600/deep-driving/

Visualizations for regressing wheel steering angles in self driving cars with Keras

Published: 09 Oct 2015

Audio / Image / Video Generation

Papers

Optimizing Neural Networks That Generate Images

intro: 2014 PhD thesis
paper : http://www.cs.toronto.edu/~tijmen/tijmen_thesis.pdf
github: https://github.com/mrkulk/Unsupervised-Capsule-Network

Learning to Generate Chairs, Tables and Cars with Convolutional Networks

arxiv: http://arxiv.org/abs/1411.5928

DRAW: A Recurrent Neural Network For Image Generation

intro: Google DeepMind
arxiv: http://arxiv.org/abs/1502.04623
github: https://github.com/vivanov879/draw
github(Theano): https://github.com/jbornschein/draw
github(Lasagne): https://github.com/skaae/lasagne-draw
youtube: https://www.youtube.com/watch?v=Zt-7MI9eKEo&hd=1
video: http://pan.baidu.com/s/1gd3W6Fh

What is DRAW (Deep Recurrent Attentive Writer)?

blog: http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
github(tensorflow): https://github.com/kvfrans/draw

Colorizing the DRAW Model

blog: http://kvfrans.com/colorizing-the-draw-model/
github: https://github.com/kvfrans/draw-color

Understanding and Implementing Deepmind’s DRAW Model

blog: http://evjang.com/articles/draw
github: https://github.com/ericjang/draw

Generative Image Modeling Using Spatial LSTMs

arxiv: http://arxiv.org/abs/1506.03478
github: https://github.com/lucastheis/ride/

Conditional generative adversarial nets for convolutional face generation

Generating Images from Captions with Attention

arxiv: http://arxiv.org/abs/1511.02793
github: https://github.com/emansim/text2image
demo: http://www.cs.toronto.edu/~emansim/cap2im.html

Attribute2Image: Conditional Image Generation from Visual Attributes

intro: University of Michigan & Adobe Research & NEC Labs
project page: https://sites.google.com/site/attribute2image/
arxiv: http://arxiv.org/abs/1512.00570
github(Torch): https://github.com/xcyan/eccv16_attr2img

Autoencoding beyond pixels using a learned similarity metric

arxiv: http://arxiv.org/abs/1512.09300
demo: http://algoalgebra.csa.iisc.ernet.in/deepimagine/
github: https://github.com/andersbll/autoencoding_beyond_pixels
github(Tensorflow): https://github.com/timsainb/Tensorflow-MultiGPU-VAE-GAN
video: http://video.weibo.com/show?fid=1034:f00b4e5a34e8c1ebe78ccd00da95f9e0
github: https://github.com/stitchfix/fauxtograph

Deep Visual Analogy-Making

paper: https://papers.nips.cc/paper/5845-deep-visual-analogy-making
github(Tensorflow): https://github.com/carpedm20/visual-analogy-tensorflow
slides: http://slideplayer.com/slide/9147672/
mirror: http://pan.baidu.com/s/1pKgrdnt

Pixel Recurrent Neural Networks

intro: Google DeepMind. ICML 2016 best paper. PixelRNN
arxiv: http://arxiv.org/abs/1601.06759
github: https://github.com/igul222/pixel_rnn
github(Tensorflow): https://github.com/carpedm20/pixel-rnn-tensorflow
notes(by Hugo Larochelle): https://www.evernote.com/shard/s189/sh/fdf61a28-f4b6-491b-bef1-f3e148185b18/aba21367d1b3730d9334ed91d3250848
video(by Hugo Larochelle): https://www.periscope.tv/hugo_larochelle/1ypKdnMkjBnJW

Generating images with recurrent adversarial networks

arxiv: http://arxiv.org/abs/1602.05110
github: https://github.com/jiwoongim/GRAN

Pixel-Level Domain Transfer

intro: ECCV 2016
github(Torch): https://github.com/fxia22/PixelDTGAN
author page(Code and dataset): https://dgyoo.github.io/

Generative Adversarial Text to Image Synthesis

intro: ICML 2016
arxiv: http://arxiv.org/abs/1605.05396
project page: https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/embeddings-for-image-classification/generative-adversarial-text-to-image-synthesis/
github: https://github.com/reedscot/icml2016
code+dataset: http://datasets.d2.mpi-inf.mpg.de/akata/cub_txt.tar.gz

Conditional Image Generation with PixelCNN Decoders

intro: Google DeepMind. PixelCNN 2.0
arxiv: http://arxiv.org/abs/1606.05328
github(Theano): https://github.com/kundan2510/pixelCNN
gtihub(Torch): https://github.com/dritchie/pixelCNN
github(Tensorflow): https://github.com/anantzoid/Conditional-PixelCNN-decoder

Inverting face embeddings with convolutional neural networks

arxiv: http://arxiv.org/abs/1606.04189
github: https://github.com/pavelgonchar/face-transfer-tensorflow

Unsupervised Cross-Domain Image Generation

intro: Facebook AI Research. Domain Transfer Network (DTN)
arxiv: https://arxiv.org/abs/1611.02200
github(TensorFlow): https://github.com/yunjey/dtn-tensorflow

PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications

intro: OpenAI
arxiv: https://arxiv.org/abs/1701.05517
paper: http://openreview.net/pdf?id=BJrFC6ceg
github: https://github.com/openai/pixel-cnn

Generating Interpretable Images with Controllable Structure

intro: Google DeepMind
paper: http://www.scottreed.info/files/iclr2017.pdf

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

arxiv: https://arxiv.org/abs/1612.00215

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

intro: University of Wyoming & Geometric Intelligence & Montreal Institute for Learning Algorithms & University of Freiburg
project page: http://www.evolvingai.org/ppgn
paper: http://www.evolvingai.org/files/nguyen2016ppgn_v1.pdf
github: https://github.com/Evolving-AI-Lab/ppgn

Image Generation and Editing with Variational Info Generative AdversarialNetworks

arxiv: https://arxiv.org/abs/1701.04568

DeepFace: Face Generation using Deep Learning

arxiv: https://arxiv.org/abs/1701.01876

Multi-View Image Generation from a Single-View

intro: Southwest Jiaotong University & National University of Singapore
arxiv: https://arxiv.org/abs/1704.04886

Generative Cooperative Net for Image Generation and Data Augmentation

https://arxiv.org/abs/1705.02887

Statistics of Deep Generated Images

https://arxiv.org/abs/1708.02688

Sketch-to-Image Generation Using Deep Contextual Completion

https://arxiv.org/abs/1711.08972

Energy-relaxed Wassertein GANs(EnergyWGAN): Towards More Stable and High Resolution Image Generation

https://arxiv.org/abs/1712.01026

Spatial PixelCNN: Generating Images from Patches

https://arxiv.org/abs/1712.00714

Visual to Sound: Generating Natural Sound for Videos in the Wild

intro: University of North Carolina at Chapel Hill & Adobe Research
project page: http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.html
arxiv: https://arxiv.org/abs/1712.01393

Semi-supervised FusedGAN for Conditional Image Generation

https://arxiv.org/abs/1801.05551

Image Transformer

intro: Google Brain & UC Berkeley
arxiv: https://arxiv.org/abs/1802.05751

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs

https://arxiv.org/abs/1805.02456

Transferring GANs: generating images from limited data

intro: Universitat Aut`onoma de Barcelona
arxiv: https://arxiv.org/abs/1805.01677
github: https://github.com/yaxingwang/Transferring-GANs

Cross Domain Image Generation through Latent Space Exploration with Adversarial Loss

https://arxiv.org/abs/1805.10130

Face Image Generation

Fader Networks: Manipulating Images by Sliding Attributes

intro: NIPS 2017. Facebook AI Research & Sorbonne Université
arxiv: https://arxiv.org/abs/1706.00409
github: https://github.com//facebookresearch/FaderNetworks

Person Image Generation

Disentangled Person Image Generation

intro: CVPR 2018 spotlight
intro: KU-Leuven/PSI & Max Planck Institute for Informatics & ETH Zurich
arxiv: https://arxiv.org/abs/1712.02621

Pose Guided Person Image Generation

intro: NIPS 2017
arxiv: https://arxiv.org/abs/1705.09368
poster: https://homes.esat.kuleuven.be/~liqianma/NIPS17_PG2/NIPS17_PG2_poster.pdf

Deformable GANs for Pose-based Human Image Generation

intro: University of Trento & Inria Grenoble Rhone-Alpes
arxiv: https://arxiv.org/abs/1801.00055
github: https://github.com/AliaksandrSiarohin/pose-gan

Unpaired Pose Guided Human Image Generation

https://arxiv.org/abs/1901.02284

Video Generation

MoCoGAN: Decomposing Motion and Content for Video Generation

arxiv: https://arxiv.org/abs/1707.04993
github: https://github.com/sergeytulyakov/mocogan
github(PyTorch): https://github.com/DLHacks/mocogan

Attentive Semantic Video Generation using Captions

https://arxiv.org/abs/1708.05980

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

intro: AAAI2018. The University of Tokyo
project page: http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/
arxiv: https://arxiv.org/abs/1711.09618

Towards an Understanding of Our World by GANing Videos in the Wild

intro: ETH Zurich
arxiv: https://arxiv.org/abs/1711.11453
github: https://github.com//bernhard2202/improved-video-gan

Video Generation from Single Semantic Label Map

intro: CVPR 2019
arxiv: https://arxiv.org/abs/1903.04480
github: https://github.com/junting/seg2vid

Deep Generative Model

Digit Fantasies by a Deep Generative Model

demo: http://www.dpkingma.com/sgvb_mnist_demo/demo.html

Conditional generative adversarial nets for convolutional face generation

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

intro: NIPS 2015
project page: http://soumith.ch/eyescream/
homepage: http://www.cs.nyu.edu/~denton/
arxiv: http://arxiv.org/abs/1506.05751
code: http://soumith.ch/eyescream/
notes: http://colinraffel.com/wiki/deep_generative_image_models_using_a_laplacian_pyramid_of_adversarial_networks

Torch convolutional GAN: Generating Faces with Torch

blog: http://torch.ch/blog/2015/11/13/gan.html
github: https://github.com/skaae/torch-gan

One-Shot Generalization in Deep Generative Models

intro: Google DeepMind. ICML 2016
arxiv: http://arxiv.org/abs/1603.05106

Generative Image Modeling using Style and Structure Adversarial Networks

arxiv: http://arxiv.org/abs/1603.05631
github: https://github.com/xiaolonw/ss-gan

Synthesizing Dynamic Textures and Sounds by Spatial-Temporal Generative ConvNet

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks

arxiv: http://arxiv.org/abs/1605.09304

ArtGAN: Artwork Synthesis with Conditional Categorial GANs

arxiv: https://arxiv.org/abs/1702.03410

Learning to Generate Chairs with Generative Adversarial Nets

https://arxiv.org/abs/1705.10413

Blogs

Torch convolutional GAN: Generating Faces with Torch

blog: http://torch.ch/blog/2015/11/13/gan.html
github: https://github.com/skaae/torch-gan

Generating Large Images from Latent Vectors

http://blog.otoro.net/2016/04/01/generating-large-images-from-latent-vectors/

Generating Faces with Deconvolution Networks

blog: https://zo7.github.io/blog/2016/09/25/generating-faces.html
github: https://github.com/zo7/facegen

Attention Models in Image and Caption Generation

blog: https://casmls.github.io/general/2016/10/16/attention_model.html

Deconvolution and Checkerboard Artifacts

:star::star::star::star::star:
intro: Google Brain & Université de Montréal
blog: http://distill.pub/2016/deconv-checkerboard/

Projects

Generate cat images with neural networks

github: https://github.com/aleju/cat-generator

TF-VAE-GAN-DRAW

intro: A collection of generative methods implemented with TensorFlow (Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoder (VAE) and DRAW: A Recurrent Neural Network For Image Generation).
github: https://github.com/ikostrikov/TensorFlow-VAE-GAN-DRAW

Generating Large Images from Latent Vectors

project page: http://blog.otoro.net/2016/04/01/generating-large-images-from-latent-vectors/
github: https://github.com/hardmaru/cppn-gan-vae-tensorflow

Generating Large Images from Latent Vectors - Part Two

Analyzing 50k fonts using deep neural networks

Generate cat images with neural networks

intro: GAN, spatial transformers, weight initialization and LeakyReLUs.
github: https://github.com/aleju/cat-generator

Generate human faces with neural networks

github: https://github.com/aleju/face-generator

A TensorFlow implementation of DeepMind’s WaveNet paper

intro: This is a TensorFlow implementation of the WaveNet generative neural network architecture for image generation.
github: https://github.com/Zeta36/tensorflow-image-wavenet

Published: 09 Oct 2015

Adversarial Attacks and Defences

Papers

Published: 09 Oct 2015

Recognition, Detection, Segmentation and Tracking

Classification / Recognition

Published: 09 Oct 2015

Deep Learning

With Video Lectures

Computer Vision

Natural Language Processing

GPU Programming

Parallel Programming

Workshops

Resources

Applications

Papers

Papers

Show and Tell

Show, Attend and Tell

Object Descriptions

Video Captioning / Description

Projects

Tools

Blogs

Courses

Papers

DeepDriving

BRAIN4CARS: Cabin Sensing for Safe and Personalized Driving

Projects

Blogs

Papers

Face Image Generation

Person Image Generation

Video Generation

Deep Generative Model

Blogs

Projects

Papers

Classification / Recognition

About me

Recent Posts

Links