Abstract The study proposes a multi-scale feature fusion audio separation network (MFF-ASNet) based on a convolutional time-domain audio separation network (Conv-TasNet). The network aims to identify mixed multi-source signals in ultra-weak fiber Bragg grating distributed acoustic sensing (uwDAS) for traffic vehicle signal processing. The MFF-ASNet consists of parallel one-dimensional convolution with different kernel sizes and an attention feature fusion (AFF) module in the encoder. The segmented attention module (SAM) is added between the encoder and separator to further optimize feature selection. It employs the scale-invariant signal-to-distortion ratio (SISDR) and mean squared error (MSE) as loss functions. Under laboratory conditions, MFF-ASNet trains on simulated mixed data and tests on real signals achieve a validation loss of −18 dB, about 5 dB lower than the Conv-TasNet baseline. In the separation experiments of two and three sources, the mean time-frequency entropy gap of footsteps estimated by Conv-TasNet is 0.56 and 0.23, respectively, compared to the source signals. In contrast, MFF-ASNet reduces these gaps to 0 and 0.06, respectively. For processing discontinuous signals, MFF-ASNet demonstrates the ability to estimate clearer Mel spectra. In the multi-source separation experiments on artificially generated road vibration signals, the estimated signals closely match the sources in both waveform and Mel spectra. The differences are minimal, indicating accurate separation. The method also successfully separates real mixed signals involving vehicles and road bumps.