M. Vázquez, M. W. Azhar and P. Trancoso, “Exploiting the Potential of Flexible Processing Units,” 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto Alegre, Brazil, 2023, pp. 34-45, doi: 10.1109/SBAC-PAD59825.2023.00013.

In order to meet the increased computational demands and stricter power constraints of modern applications, architectures have evolved to include domain-specific dedicated accelerators. In order to design efficient accelerators, three main components need to be addressed: compute, memory, and control. Moreover, since SoCs usually contain multiple accelerators, selecting the right one for each task also become crucial. This becomes specially […]

Read More

Max Doblas Font, Oscar Lostes-Cazorla, Quim Aguado-Puig, Nick Cebry, Pau Fontova, Christopher Batten, Santiago Marco-Sola, and Miquel Moreto. “GMX: Instruction Set Extensions for Fast, Scalable, and Efficient Genome Sequence Alignment.” 56th ACM/IEEE Int’l Symp. on Microarchitecture (MICRO), Oct. 2023.

science with practical applications ranging from pattern matching to computational biology. The ever-increasing volumes of genomic data produced by modern DNA sequencers motivate improved software and hardware sequence alignment accelerators that scale with longer sequence lengths and high error rates without losing accuracy. Furthermore, the wide variety of use cases requiring sequence alignment demands flexible and efficient solutions that can […]

Read More

Jing Chen, Madhavan Manivannan, Bhavishya Goel, and Miquel Pericàs. “JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency”. In Proceedings of the 52nd International Conference on Parallel Processing (ICPP ’23). Association for Computing Machinery, New York, NY, USA, 828–838. https://doi.org/10.1145/3605573.3605586

Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallelprogramming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry and CPUDVFS. Additionally, these proposals mostly use heuristics and lack the ability to explore the trade-offs between energy usage andperformance. However, our findings demonstrate that focusing solely on CPU energy […]

Read More

Lluc Alvarez, Abraham Ruiz, Arnau Bigas-Soldevilla, Pavel Kuroedov, Alberto Gonzalez, Hamsika Mahale, Noe Bustamante, Albert Aguilera, Francesco Minervini, Javier Salamero, Oscar Palomar, Vassilis Papaefstathiou, Antonis Psathakis, Nikolaos Dimou, Michalis Giaourtas, Iasonas Mastorakis, Georgios Ieronymakis, Georgios-Michail Matzouranis, Vasilis Flouris, Nick Kossifidis, Manolis Marazakis, Bhavishya Goel, Madhavan Manivannan, Ahsen Ejaz, Panagiotis Strikos, Mateo Vázquez, Ioannis Sourdis, Pedro Trancoso, Per Stenström, Jens Hagemeyer, Lennart Tigges, Nils Kucza, Jean-Marc Philippe, and Ioannis Papaefstathiou. “EProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem”. In Proceedings of the 20th ACM International Conference on Computing Frontiers (CF ’23). Association for Computing Machinery, New York, NY, USA, 309–314. https://doi.org/10.1145/3587135.3592178

The eProcessor project aims at creating a RISC-V full stack ecosystem. The eProcessor architecture combines a high-performance out-of-order core with energy-efficient accelerators for vector processing and artificial intelligence with reduced-precision functional units. The design of this architecture follows a hardware/software co-design approach with relevant application use cases from the high-performance computing, bioinformatics and artificial intelligence domains. Two eProcessor prototypes will […]

Read More

M. V. Maceiras, M. Waqar Azhar and P. Trancoso, “VSA: A Hybrid Vector-Systolic Architecture,” 2022 IEEE 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 2022, pp. 368-376, doi: 10.1109/ICCD56317.2022.00061

In order to deliver high performance efficiently, modern processors include dedicated hardware to accelerate different application domains. For example, several recent processors include dedicated Machine Learning (ML) accelerators. However, while adding dedicated hardware improves efficiency compared to general-purpose CPUs, it also requires a larger area, making it unfeasible for smaller devices. Therefore, exploring ways to use the existing hardware for […]

Read More

Josue Quiroga, Roberto Ignacio Genovese, Ivan Diaz, Henrique Yano, Asif Ali, Nehir Sonmez, Oscar Palomar, Victor Jimenez, Mario Rodriguez, Marc Dominguez, “Reusable Verification Environment for a RISC-V Vector Accelerator”, DVcon 2022

This paper presents a reusable verification environment developed for the verification of an academic RISC-V based vector accelerator that operates with long vectors. In order to be used across diverse projects, this infrastructure intends to be independent of the interface used for connecting the accelerator to the scalar processor core. We built a verification infrastructure consisting of a Universal Verification […]

Read More

Jing Chen, Madhavan Manivannan, Bhavishya Goel, Mustafa Abduljabbar, and Miquel Pericas.”STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures”, In Proceedings IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2022).

Reducing the energy consumption of parallel applications is becoming increasingly important. Current chip multiprocessors (CMPs) incorporate asymmetric cores (i.e. static asymmetry) and DVFS (i.e. dynamic asymmetry) to enable energy efficient execution. To reduce cost and complexity, designs typically organize asymmetric cores into core-clusters supporting the same DVFS setting across cores in a cluster. Recent approaches that focus on energy efficient […]

Read More

Jing Chen, Madhavan Manivannan, Mustafa Abduljabbar, and Miquel Pericàs, “ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes.” In ACM Transaction of Architecture and Code Optimization (TACO) 19, 2, Article 27

Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challenging, particularly when using asymmetric architectures with different types of CPU cores. A common approach for energy savings involves dynamic voltage and frequency scaling (DVFS) wherein […]

Read More

eProcessor Presented in HPCA’s workshop

We are happy that our paper entitled “Accelerating the Wavefront Alignment Algorithm on CPUs, GPUs and FPGAs” by Miquel Moreto and Santiago Marco-Sola was presented at 4th HPCA Workshop on Accelerator Architecture in Computational Biology and Bioinformatics (https://aacbb-workshop.github.io/).

Read More

A. Ejaz and I. Sourdis, “FastTrackNoC: A NoC with FastTrack Router Datapaths,” 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea, Republic of, 2022, pp. 971-985, doi: 10.1109/HPCA53966.2022.00075.

This paper introduces FastTrackNoC, a Network-on-Chip (NoC) router architecture that reduces packet latency by bypassing its switch traversal (ST) stage. It is based on the observation that there is a bias in the direction a flit takes through a router, e.g., in a 2D mesh network, non-turning hops are preferred, especially when dimension order routing is used. FastTrackNoC capitalizes on […]

Read More