We consider the adaptive shortest-path routing problem in wireless networks
under unknown and stochastically varying link states. In this problem, we aim
to optimize the quality of communication between a source and a destination
through adaptive path selection. Due to the randomness and uncertainties in the
network dynamics, the quality of each link varies over time according to a
stochastic process with unknown distributions. After a path is selected for
communication, the aggregated quality of all links on this path (e.g., total
path delay) is observed.
We consider the multi-armed bandit problems in which a player aims to accrue
reward by sequentially playing a given set of arms with unknown reward
statistics. In the classic work, policies were proposed to achieve the optimal
logarithmic regret order for some special classes of light-tailed reward
distributions, e.g., Auer et al.'s UCB1 index policy for reward distributions
with finite support.
In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with
unknown reward distributions. At each time, a player selects one arm to play,
aiming to maximize the total expected reward over a horizon of length T. The
essence of the problem is the tradeoff between exploration and exploitation:
playing a less explored arm to learn its reward statistics for future benefit
or playing the arm with the largest sample mean at the current time.
A flaw in the greedy approximation algorithm proposed by Zhang et al. for
minimum connected set cover problem is corrected, and a stronger result on the
approximation ratio of the modified greedy algorithm is established. The
results are now consistent with the existing results on connected dominating
set problem which is a special case of the minimum connected set cover problem.
We consider decentralized restless multi-armed bandit problems with unknown
dynamics and multiple players. The reward state of each arm transits according
to an unknown Markovian rule when it is played and evolves according to an
arbitrary unknown random process when it is passive. Players activating the
same arm at the same time collide and suffer from reward loss. The objective is
to maximize the long-term reward by designing a decentralized arm selection
policy to address unknown reward models and collisions among players.
We consider the restless multi-armed bandit (RMAB) problem with unknown
dynamics. In this problem, at each time, a player chooses K out of N (N > K)
arms to play. The state of each arm determines the reward when the arm is
played and transits according to Markovian rules no matter the arm is engaged
or passive. The Markovian dynamics of the arms are unknown to the player. The
objective is to maximize the long-term reward by designing an optimal arm
selection policy.
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are
$N$ arms, with rewards on all arms evolving at each time as Markov chains with
known parameters. A player seeks to activate $K \geq 1$ arms at each time in
order to maximize the expected total reward obtained over multiple plays. RMAB
is a challenging problem that is known to be PSPACE-hard in general. We
consider in this work the even harder non-Bayesian RMAB, in which the
parameters of the Markov chain are assumed to be unknown \emph{a priori}.
We analyze the multihop delay of ad hoc cognitive radio networks, where the
transmission delay of each hop consists of the propagation delay and the
waiting time for the availability of the communication channel (i.e., the
occurrence of a spectrum opportunity at this hop). Using theories and
techniques from continuum percolation and ergodicity, we establish the scaling
law of the minimum multihop delay with respect to the source-destination
distance in cognitive radio networks.
We consider multi-armed bandit with distributed players, where each player
independently samples one of N stochastic processes with unknown parameters and
accrues reward in each slot without information exchange. Users choosing the
same arm collide, and none or only one receives reward depending on the
collision model. This problem can be formulated as a decentralized multi-armed
bandit problem.
We consider multi-armed bandit with distributed players, where each player
independently samples one of N stochastic processes with unknown parameters and
accrues reward in each slot without information exchange. Users choosing the
same arm collide, and none or only one receives reward depending on the
collision model. This problem can be formulated as a decentralized multi-armed
bandit problem.
In this paper we investigate pointed Hopf algebras via quiver methods. We
classify all possible Hopf structures arising from minimal Hopf quivers, namely
basic cycles and the linear chain. This provides full local structure
information for general pointed Hopf algebras.
In this paper we investigate pointed Hopf algebras via quiver methods. We
classify all possible Hopf structures arising from minimal Hopf quivers, namely
basic cycles and the linear chain. This provides full local structure
information for general pointed Hopf algebras.