In federated learning (FL), if the participating mobile devices have low computing power and poor wireless channel conditions and/or they do not have sufficient data for various classes, a long convergence time is required to achieve the desired model accuracy. To address this problem, we first formulate a constrained Markov decision process (CMDP) problem that aims to minimize the average time of rounds while maintaining the numbers of trained data and trained data classes above certain numbers. To obtain the optimal scheduling policy, the formulated CMDP problem is converted into an equivalent linear programming (LP). Additionally, to overcome the problem of the curse of dimensionality in CMDP, we develop a joint client selection and bandwidth allocation algorithm (J-CSBA) that jointly selects appropriate mobile devices and allocates suitable amount of bandwidth to them at each round by considering their data information, computing power, and channel gain. Evaluation results validate that J-CSBA can reduce the convergence time by up to $49\%$ compared to a conventional random scheme.