the crypto engine positioning needs to be optimised to achieve high throughput andenhance the parallel processing capability. As shown in Figure 4, this NSParchitecture provides a scalable platform for configuring data processing arrays aswell as data transfer paths. System design parameters, such as the number of parallelcrypto engines and the number of C/WDMAs and RDMAs, are all scalable andconfigurable to meet different cost/performance tradeoffs. The processing capabilityof the crypto engine arrays needs to match the data transfer capability so that thedata transfer efficiency will not become a bottleneck of the overall performance andextra area/power consumptions can be saved. A coarse-grain performanceevaluation experiment was done to determine the specific design parameters suchas the number of crypto engines for each kind of cryptographic algorithm, the busdata width and the number of C/WDMAs and RDMAs.A software implementation of the coarse-grain analytical performance evaluation model was developed based on the platform architecture shown in Figure 5.Different system design parameter patterns are generated to configure the systemwith statistic security related workloads imported as input stimulus. Figure 6illustrates the performance evaluation results for the DES and ECC arrays. Anoptimised DES core which generates a throughput of 1200 Mbps and a systolic arraybased ECC core which performs 1450 256-bit scalar multiplications per second onthe general curve of GF(2n) were used for these performance evaluations. The inputstimulus used an initial workload of 1 KB data packets with the input workloadgradually increasing. Four different configuration patterns were generated for thecomparison. The curves follow similar trends in both plots with the throughputsreaching peaks at four times the initial workload and dropping as the workloadcontinues increasing. The reason for that is the delay time between the same types ofrequests for the crypto engines is reduced as the input workload grows and theresource contentions increase, say for the DMAs and buses, as the input workload
No comments:
Post a Comment
Note: only a member of this blog may post a comment.