In order to meet the increased computational demands and stricter power constraints of modern applications, architectures have evolved to include domain-specific dedicated accelerators. In order to design efficient accelerators, three main components need to be addressed: compute, memory, and control. Moreover, since SoCs usually contain multiple accelerators, selecting the right one for each task also become crucial. This becomes specially relevant in Flexible Processing Units (xPUs), processing units that provide multiple functionalities with the same hardware. While it is possible to use shared support components for all functionalities, this will lead to suboptimal performance. In this work, we take one example of such xPU, and analyze the aspects which have not yet been fully addressed, showing that there is more potential to be exploited. By understanding the required memory patterns, we can achieve up to 72% speedup gains compared to using the memory support optimized for a different functionality. Moreover, we propose an in-depth analysis of both functionalities to help select the best one. This way, we can maximize hardware utilization by combining both functionalities.