计算机科学
转置
扩展(谓词逻辑)
指令集
并行计算
集合(抽象数据类型)
程序设计语言
物理
量子力学
特征向量
出处
期刊:International Journal of Computer Science and Information Technology
[Academy and Industry Research Collaboration Center]
日期:2014-06-30
卷期号:6 (3): 67-78
被引量:5
标识
DOI:10.5121/ijcsit.2014.6305
摘要
General-purpose microprocessors are augmented with short-vector instruction extensions in order to simultaneously process more than one data element using the same operation.This type of parallelism is known as data-parallel processing.Many scientific, engineering, and signal processing applications can be formulated as matrix operations.Therefore, accelerating these kernel operations on microprocessors, which are the building blocks or large high-performance computing systems, will definitely boost the performance of the aforementioned applications.In this paper, we consider the acceleration of the matrix transpose operation using the 256-bit Intel advanced vector extension (AVX) instructions.We present a novel vector-based matrix transpose algorithm and its optimized implementation using AVX instructions.The experimental results on Intel Core i7 processor demonstrates a 2.83 speedup over the standard sequential implementation, and a maximum of 1.53 speedup over the GCC library implementation.When the transpose is combined with matrix addition to compute the matrix update, B + A T , where A and B are squared matrices, the speedup of our implementation over the sequential algorithm increased to 3.19.
科研通智能强力驱动
Strongly Powered by AbleSci AI