We consider an opportunistic spectrum access (OSA) problem where the
time-varying condition of each channel (e.g., as a result of random fading or
certain primary users' activities) is modeled as an arbitrary finite-state
Markov chain. At each instance of time, a (secondary) user probes a channel and
collects a certain reward as a function of the state of the channel (e.g., good
channel condition results in higher data rate for the user).
We consider the classical multi-armed bandit problem with Markovian rewards.
When played an arm changes its state in a Markovian fashion while it remains
frozen when not played. The player receives a state-dependent reward each time
it plays an arm. The number of states and the state transition probabilities of
an arm are unknown to the player. The player's objective is to maximize its
long-term total reward by learning the best arm over time.
In this paper, we propose and analyze the properties of a new class of games
- the network congestion game (NCG), which is a generalization of the classical
congestion game (CG). In a classical congestion game,multiple users share the
same set of resources and a users payoff for using any resource is a function
of the total number of users sharing it. This game enjoys some very appealing
properties, including the existence of a pure strategy Nash equilibrium (NE)
and that every improvement path is finite and leads to such a NE (also called
the finite improvement property or FIP).