<- Home

Overview

Figure 4: Architecture of Curious Dr.\ MISFA includes (a) a reinforcement learning agent that generates an observation stream selection policy based on the intrinsic rewards, (b) an adaptive Incremental SFA coupled with Robust Online Clustering module that updates an abstraction based on the incoming observations, and (c) a gating system that prevents encoding observations that have been previously encoded.

Figure 5: Learning Problem. Given a set of time-varying observation streams, an abstraction corresponding to the easiest-encodable yet unknown observation stream is learned first. (Left-Figure) An example result after the first abstraction was learned. (Right-Figure) $\Omega ({\bf x})$ denotes the curiosity-function that maps an observation stream ${\bf x}$ to a scalar value in $[0,1]$. Difficult-to-encode observation streams have higher $\Omega$ values. Figure shows the desired result, which is a sequence of abstractions $\{\phi_i\}_{i \in \mathbb{N}}$ learned in the order of increasing $\Omega$-values of the observation streams that they encode. However, $\Omega$ is not known a priori, therefore, the learning process involves estimating both the abstractions and the $\Omega$-values. The curved arrow indicates the temporal evolution of the learning process.

Figure 6: See text for details.

IncSFA extracts slow features without storing or estimating computationally expensive covariance matrices of the input data. This makes it suitable to use IncSFA for applications with high-dimensional images as inputs. However, if the statistics of the inputs change over time, like most online-learning approaches, IncSFA gradually forgets previously learned representations, Curiosity-Driven Modular Incremental Slow Feature Analysis (Curious Dr. MISFA) uses the theory of artificial curiosity to address the forgetting problem faced by IncSFA, by retaining what was previously learned in the form of expert modules. From a set of input video streams, Curious Dr. MISFA actively learns multiple expert modules comprising slow feature abstractions in the order of increasing learning difficulty, with theoretical guarantees.

Figure 4 shows the architecture of Curious Dr. MISFA.

Figure 6 shows a simple proof-of-concept experimental result conducted using three 2D oscillatory audio streams. The environment has three internal states $\mathcal{S}^{\text{int}} = \{s^{\text{int}}_1, s^{\text{int}}_2, s^{\text{int}}_3\}$ associated with the observation streams (Figure 6(a)). The stream ${\bf x}_1$ is the easiest signal to encode followed by the input streams ${\bf x}_2$ and ${\bf x}_3$. The dynamics of the algorithm can be observed by studying the time varying reward function $R^{\text{int}}$ and the ROC estimation error $\xi$. Figure 6(b) shows the reward function for a single run of the experiment. Solid lines represent the reward for the action stay in each state $s^{\text{int}}_i$, while the dotted lines represent the marginalized reward for the action switch at each state. For the sake of explanation, the learning process can be thought of as passing through three phases, where each phase corresponds to learning a single abstraction module.
Phase 1: At the beginning of Phase 1, the agent starts exploring by executing either stay or switch at each state. After a few hundred algorithm iterations, the reward function begins to stabilize and is such that $R^{\text{int}}(s^{\text{int}}_1,$ stay$) > R^{\text{int}}(s^{\text{int}}_2,$ stay$) > R^{\text{int}}(s^{\text{int}}_3,$ stay$) > 0$, ordered according to the learning difficulty of the observation streams. However, the reward components for the switch action are either close to zero or negative. Therefore, the policy $\pi^{\text{int}}$ converges to the optimal policy (i.e. to stay at the state corresponding to the easiest observation stream ${\bf x}_1$ and switch at every other state). As $\epsilon$ decays, the agent begins to exploit the learned policy, and the adaptive IncSFA-ROC abstraction $\widehat{\phi}$ converges to the slow feature corresponding to the observation stream ${\bf x}_1$. The ROC estimation error (Figure 6(c)) decreases and falls below the threshold $\delta$, at which point, the abstraction is added to the abstraction set $\Phi$. The increase in the reward value of $R^{\text{int}}(s^{\text{int}}_1,$ stay$)$ near the end of the phase is caused by the second term ($\sum_{\tau}\left(Z(|\delta - \xi^{roc}|\right)$) in the reward equation. Both $\epsilon$ and $R^{\text{int}}$ are reset and the algorithm enters Phase 2 at ($t \approx 75k$).
Phase 2: The agent begins to explore again, however, it does not receive any reward for the ($s^{\text{int}}_1$, stay) tuple because of the gating system. After a few hundred algorithm iterations, $R^{\text{int}}(s^{\text{int}}_2,$ stay$) > R^{\text{int}}(s^{\text{int}}_3,$ stay$) > R^{\text{int}}(s^{\text{int}}_1,$ stay$)=0$, the adaptive abstraction converges, but to the slow feature corresponding to the observation stream ${\bf x}_2$.
Phase 3: The process continues again until the third abstraction is learned. A link to the Python implementation of the Curious Dr. MISFA algorithm can be found in the Software section below.

Videos

Software

Python (requires PyQtGraph)

Top ->