FC2カウンター FPGAの部屋 CNNのVivado HLS実装のstraight_conv_nn2 の演算精度を変更する

FPGAやCPLDの話題やFPGA用のツールの話題などです。 マニアックです。 日記も書きます。

FPGAの部屋

FPGAの部屋の有用と思われるコンテンツのまとめサイトを作りました。Xilinx ISEの初心者の方には、FPGAリテラシーおよびチュートリアルのページをお勧めいたします。

CNNのVivado HLS実装のstraight_conv_nn2 の演算精度を変更する

CNNのVivado HLS実装のstraight_conv_nn2 を再度C シミュレーション”の続き。

前回は、straight_conv_nn2 の精度が良かったのは画像が良かったからという結論が出たので、精度の悪そうな画像でもう一度、straight_conv_nn2 のC シミュレーションを行った。そうすると、ハードウェアの精度は56.7 % だった。これでは精度が悪すぎるので、演算の精度、つまりビット幅を見直すことにした。演算のビット幅を変えながら誤差を見ていこう。

今の演算のビット幅は、以下の通りだ。

ap_fixed<10, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<10, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<13, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<13, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


畳み込み層の演算のビット幅は、10ビット幅で整数部が6ビット、全結合層の演算のビット幅は、13ビットで整数部が7ビットだ。
これを1ビットずつ増やしてみよう。そして、エラーの数を数えるコードをテストベンチに追加した。
最初に今の演算のビット幅でもう一度、C シミュレーションを行った。
wlt_cnn_148_170911.png
ハードウェアのエラーの個数 hw_err_count は 65 個で、ソフトウェアのエラーの個数 sw_err_count は 17 個だ。
ログを貼っておく。

id = 0, max_id_ref = 1, max_id_hw = 2
id = 0, max_id_ref = 1, max_id_sw = 2
id = 1, max_id_ref = 1, max_id_hw = 2
id = 1, max_id_ref = 1, max_id_sw = 2
id = 5, max_id_ref = 1, max_id_hw = 2
id = 5, max_id_ref = 1, max_id_sw = 2
id = 6, max_id_ref = 1, max_id_hw = 2
id = 6, max_id_ref = 1, max_id_sw = 2
id = 10, max_id_ref = 1, max_id_hw = 2
id = 10, max_id_ref = 1, max_id_sw = 2
id = 15, max_id_ref = 1, max_id_sw = 2
id = 26, max_id_ref = 1, max_id_hw = 0
id = 27, max_id_ref = 1, max_id_hw = 0
id = 28, max_id_ref = 1, max_id_hw = 0
id = 29, max_id_ref = 1, max_id_hw = 0
id = 32, max_id_ref = 1, max_id_hw = 0
id = 33, max_id_ref = 1, max_id_hw = 0
id = 34, max_id_ref = 1, max_id_hw = 0
id = 38, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_sw = 0
id = 43, max_id_ref = 1, max_id_hw = 0
id = 44, max_id_ref = 1, max_id_hw = 0
id = 49, max_id_ref = 1, max_id_hw = 0
id = 80, max_id_ref = 0, max_id_sw = 2
id = 85, max_id_ref = 0, max_id_sw = 2
id = 86, max_id_ref = 0, max_id_sw = 2
id = 90, max_id_ref = 0, max_id_sw = 2
id = 91, max_id_ref = 0, max_id_sw = 2
id = 95, max_id_ref = 0, max_id_sw = 2
id = 96, max_id_ref = 0, max_id_sw = 2
id = 97, max_id_ref = 0, max_id_sw = 2
id = 98, max_id_ref = 0, max_id_sw = 2
id = 99, max_id_ref = 0, max_id_sw = 2
id = 100, max_id_ref = 2, max_id_hw = 0
id = 101, max_id_ref = 2, max_id_hw = 0
id = 102, max_id_ref = 2, max_id_hw = 0
id = 103, max_id_ref = 2, max_id_hw = 0
id = 104, max_id_ref = 2, max_id_hw = 0
id = 105, max_id_ref = 2, max_id_hw = 0
id = 106, max_id_ref = 2, max_id_hw = 0
id = 107, max_id_ref = 2, max_id_hw = 0
id = 108, max_id_ref = 2, max_id_hw = 0
id = 109, max_id_ref = 2, max_id_hw = 0
id = 110, max_id_ref = 2, max_id_hw = 0
id = 111, max_id_ref = 2, max_id_hw = 0
id = 112, max_id_ref = 2, max_id_hw = 0
id = 113, max_id_ref = 2, max_id_hw = 0
id = 114, max_id_ref = 2, max_id_hw = 0
id = 115, max_id_ref = 2, max_id_hw = 0
id = 116, max_id_ref = 2, max_id_hw = 0
id = 117, max_id_ref = 2, max_id_hw = 0
id = 118, max_id_ref = 2, max_id_hw = 0
id = 120, max_id_ref = 2, max_id_hw = 0
id = 121, max_id_ref = 2, max_id_hw = 0
id = 122, max_id_ref = 2, max_id_hw = 0
id = 123, max_id_ref = 2, max_id_hw = 0
id = 125, max_id_ref = 2, max_id_hw = 0
id = 126, max_id_ref = 2, max_id_hw = 0
id = 127, max_id_ref = 2, max_id_hw = 0
id = 128, max_id_ref = 2, max_id_hw = 0
id = 129, max_id_ref = 2, max_id_hw = 0
id = 130, max_id_ref = 2, max_id_hw = 0
id = 131, max_id_ref = 2, max_id_hw = 0
id = 132, max_id_ref = 2, max_id_hw = 0
id = 133, max_id_ref = 2, max_id_hw = 0
id = 134, max_id_ref = 2, max_id_hw = 0
id = 135, max_id_ref = 2, max_id_hw = 0
id = 136, max_id_ref = 2, max_id_hw = 0
id = 137, max_id_ref = 2, max_id_hw = 0
id = 138, max_id_ref = 2, max_id_hw = 0
id = 139, max_id_ref = 2, max_id_hw = 0
id = 140, max_id_ref = 2, max_id_hw = 0
id = 141, max_id_ref = 2, max_id_hw = 0
id = 142, max_id_ref = 2, max_id_hw = 0
id = 143, max_id_ref = 2, max_id_hw = 0
id = 144, max_id_ref = 2, max_id_hw = 0
id = 145, max_id_ref = 2, max_id_hw = 0
id = 146, max_id_ref = 2, max_id_hw = 0
id = 147, max_id_ref = 2, max_id_hw = 0
id = 148, max_id_ref = 2, max_id_hw = 0
id = 149, max_id_ref = 2, max_id_hw = 0
hw_err_count = 65
sw_err_count = 17



次に、畳み込み層と全結合層の演算のビット幅を 1 ビット増やしてみよう。

ap_fixed<11, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<11, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<14, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<14, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


これでC シミュレーションを行った。
wlt_cnn_149_170911.png
ハードウェアのエラーの個数は 45 個に減少した。
ログを貼っておく。

id = 0, max_id_ref = 1, max_id_hw = 2
id = 0, max_id_ref = 1, max_id_sw = 2
id = 1, max_id_ref = 1, max_id_hw = 2
id = 1, max_id_ref = 1, max_id_sw = 2
id = 5, max_id_ref = 1, max_id_hw = 2
id = 5, max_id_ref = 1, max_id_sw = 2
id = 6, max_id_ref = 1, max_id_sw = 2
id = 10, max_id_ref = 1, max_id_hw = 2
id = 10, max_id_ref = 1, max_id_sw = 2
id = 15, max_id_ref = 1, max_id_sw = 2
id = 28, max_id_ref = 1, max_id_hw = 0
id = 29, max_id_ref = 1, max_id_hw = 0
id = 33, max_id_ref = 1, max_id_hw = 0
id = 34, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_sw = 0
id = 44, max_id_ref = 1, max_id_hw = 0
id = 80, max_id_ref = 0, max_id_sw = 2
id = 85, max_id_ref = 0, max_id_sw = 2
id = 86, max_id_ref = 0, max_id_sw = 2
id = 90, max_id_ref = 0, max_id_sw = 2
id = 91, max_id_ref = 0, max_id_sw = 2
id = 95, max_id_ref = 0, max_id_sw = 2
id = 96, max_id_ref = 0, max_id_sw = 2
id = 97, max_id_ref = 0, max_id_sw = 2
id = 98, max_id_ref = 0, max_id_sw = 2
id = 99, max_id_ref = 0, max_id_sw = 2
id = 101, max_id_ref = 2, max_id_hw = 0
id = 102, max_id_ref = 2, max_id_hw = 0
id = 103, max_id_ref = 2, max_id_hw = 0
id = 104, max_id_ref = 2, max_id_hw = 0
id = 106, max_id_ref = 2, max_id_hw = 0
id = 107, max_id_ref = 2, max_id_hw = 0
id = 108, max_id_ref = 2, max_id_hw = 0
id = 110, max_id_ref = 2, max_id_hw = 0
id = 111, max_id_ref = 2, max_id_hw = 0
id = 112, max_id_ref = 2, max_id_hw = 0
id = 116, max_id_ref = 2, max_id_hw = 0
id = 125, max_id_ref = 2, max_id_hw = 0
id = 126, max_id_ref = 2, max_id_hw = 0
id = 127, max_id_ref = 2, max_id_hw = 0
id = 128, max_id_ref = 2, max_id_hw = 0
id = 129, max_id_ref = 2, max_id_hw = 0
id = 130, max_id_ref = 2, max_id_hw = 0
id = 131, max_id_ref = 2, max_id_hw = 0
id = 132, max_id_ref = 2, max_id_hw = 0
id = 133, max_id_ref = 2, max_id_hw = 0
id = 134, max_id_ref = 2, max_id_hw = 0
id = 135, max_id_ref = 2, max_id_hw = 0
id = 136, max_id_ref = 2, max_id_hw = 0
id = 137, max_id_ref = 2, max_id_hw = 0
id = 138, max_id_ref = 2, max_id_hw = 0
id = 139, max_id_ref = 2, max_id_hw = 0
id = 140, max_id_ref = 2, max_id_hw = 0
id = 141, max_id_ref = 2, max_id_hw = 0
id = 142, max_id_ref = 2, max_id_hw = 0
id = 143, max_id_ref = 2, max_id_hw = 0
id = 144, max_id_ref = 2, max_id_hw = 0
id = 146, max_id_ref = 2, max_id_hw = 0
id = 147, max_id_ref = 2, max_id_hw = 0
id = 148, max_id_ref = 2, max_id_hw = 0
id = 149, max_id_ref = 2, max_id_hw = 0
hw_err_count = 45
sw_err_count = 17


ハードウェアのエラーの個数は 45 個に減少した。


畳み込み層と全結合層の演算のビット幅を 2 ビット増やしてみた。

ap_fixed<12, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<12, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<15, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<15, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


C シミュレーションを行った。
ログを貼っておく。

id = 0, max_id_ref = 1, max_id_hw = 2
id = 0, max_id_ref = 1, max_id_sw = 2
id = 1, max_id_ref = 1, max_id_hw = 2
id = 1, max_id_ref = 1, max_id_sw = 2
id = 5, max_id_ref = 1, max_id_hw = 2
id = 5, max_id_ref = 1, max_id_sw = 2
id = 6, max_id_ref = 1, max_id_hw = 2
id = 6, max_id_ref = 1, max_id_sw = 2
id = 10, max_id_ref = 1, max_id_hw = 2
id = 10, max_id_ref = 1, max_id_sw = 2
id = 15, max_id_ref = 1, max_id_sw = 2
id = 29, max_id_ref = 1, max_id_hw = 0
id = 34, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_sw = 0
id = 44, max_id_ref = 1, max_id_hw = 0
id = 80, max_id_ref = 0, max_id_sw = 2
id = 85, max_id_ref = 0, max_id_sw = 2
id = 86, max_id_ref = 0, max_id_sw = 2
id = 90, max_id_ref = 0, max_id_sw = 2
id = 91, max_id_ref = 0, max_id_sw = 2
id = 95, max_id_ref = 0, max_id_sw = 2
id = 96, max_id_ref = 0, max_id_sw = 2
id = 97, max_id_ref = 0, max_id_sw = 2
id = 98, max_id_ref = 0, max_id_sw = 2
id = 99, max_id_ref = 0, max_id_sw = 2
id = 102, max_id_ref = 2, max_id_hw = 0
id = 103, max_id_ref = 2, max_id_hw = 0
id = 106, max_id_ref = 2, max_id_hw = 0
id = 107, max_id_ref = 2, max_id_hw = 0
id = 108, max_id_ref = 2, max_id_hw = 0
id = 111, max_id_ref = 2, max_id_hw = 0
id = 112, max_id_ref = 2, max_id_hw = 0
id = 125, max_id_ref = 2, max_id_hw = 0
id = 126, max_id_ref = 2, max_id_hw = 0
id = 127, max_id_ref = 2, max_id_hw = 0
id = 128, max_id_ref = 2, max_id_hw = 0
id = 129, max_id_ref = 2, max_id_hw = 0
id = 130, max_id_ref = 2, max_id_hw = 0
id = 131, max_id_ref = 2, max_id_hw = 0
id = 132, max_id_ref = 2, max_id_hw = 0
id = 133, max_id_ref = 2, max_id_hw = 0
id = 134, max_id_ref = 2, max_id_hw = 0
id = 135, max_id_ref = 2, max_id_hw = 0
id = 136, max_id_ref = 2, max_id_hw = 0
id = 137, max_id_ref = 2, max_id_hw = 0
id = 138, max_id_ref = 2, max_id_hw = 0
id = 139, max_id_ref = 2, max_id_hw = 0
id = 140, max_id_ref = 2, max_id_hw = 0
id = 141, max_id_ref = 2, max_id_hw = 0
id = 142, max_id_ref = 2, max_id_hw = 0
id = 143, max_id_ref = 2, max_id_hw = 0
id = 144, max_id_ref = 2, max_id_hw = 0
id = 146, max_id_ref = 2, max_id_hw = 0
id = 147, max_id_ref = 2, max_id_hw = 0
id = 148, max_id_ref = 2, max_id_hw = 0
id = 149, max_id_ref = 2, max_id_hw = 0
hw_err_count = 40
sw_err_count = 17


ハードウェアのエラーの個数は 40 個に減少した。


畳み込み層と全結合層の演算のビット幅を 3 ビット増やしてみた。

ap_fixed<13, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<13, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<16, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<16, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


C シミュレーションを行った。
ログを貼っておく。

id = 0, max_id_ref = 1, max_id_hw = 2
id = 0, max_id_ref = 1, max_id_sw = 2
id = 1, max_id_ref = 1, max_id_hw = 2
id = 1, max_id_ref = 1, max_id_sw = 2
id = 5, max_id_ref = 1, max_id_hw = 2
id = 5, max_id_ref = 1, max_id_sw = 2
id = 6, max_id_ref = 1, max_id_hw = 2
id = 6, max_id_ref = 1, max_id_sw = 2
id = 10, max_id_ref = 1, max_id_hw = 2
id = 10, max_id_ref = 1, max_id_sw = 2
id = 15, max_id_ref = 1, max_id_sw = 2
id = 29, max_id_ref = 1, max_id_hw = 0
id = 34, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_sw = 0
id = 44, max_id_ref = 1, max_id_hw = 0
id = 80, max_id_ref = 0, max_id_sw = 2
id = 85, max_id_ref = 0, max_id_sw = 2
id = 86, max_id_ref = 0, max_id_sw = 2
id = 90, max_id_ref = 0, max_id_sw = 2
id = 91, max_id_ref = 0, max_id_sw = 2
id = 95, max_id_ref = 0, max_id_sw = 2
id = 96, max_id_ref = 0, max_id_sw = 2
id = 97, max_id_ref = 0, max_id_sw = 2
id = 98, max_id_ref = 0, max_id_sw = 2
id = 99, max_id_ref = 0, max_id_sw = 2
id = 102, max_id_ref = 2, max_id_hw = 0
id = 107, max_id_ref = 2, max_id_hw = 0
id = 126, max_id_ref = 2, max_id_hw = 0
id = 127, max_id_ref = 2, max_id_hw = 0
id = 128, max_id_ref = 2, max_id_hw = 0
id = 129, max_id_ref = 2, max_id_hw = 0
id = 131, max_id_ref = 2, max_id_hw = 0
id = 132, max_id_ref = 2, max_id_hw = 0
id = 133, max_id_ref = 2, max_id_hw = 0
id = 134, max_id_ref = 2, max_id_hw = 0
id = 136, max_id_ref = 2, max_id_hw = 0
id = 137, max_id_ref = 2, max_id_hw = 0
id = 138, max_id_ref = 2, max_id_hw = 0
id = 139, max_id_ref = 2, max_id_hw = 0
id = 142, max_id_ref = 2, max_id_hw = 0
id = 143, max_id_ref = 2, max_id_hw = 0
id = 144, max_id_ref = 2, max_id_hw = 0
id = 147, max_id_ref = 2, max_id_hw = 0
id = 148, max_id_ref = 2, max_id_hw = 0
hw_err_count = 28
sw_err_count = 17


ハードウェアのエラーの個数は 28 個に減少した。


畳み込み層と全結合層の演算のビット幅を 4 ビット増やしてみた。

ap_fixed<14, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<14, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<17, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<17, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


C シミュレーションを行った。
ログを貼っておく。

id = 0, max_id_ref = 1, max_id_hw = 2
id = 0, max_id_ref = 1, max_id_sw = 2
id = 1, max_id_ref = 1, max_id_hw = 2
id = 1, max_id_ref = 1, max_id_sw = 2
id = 5, max_id_ref = 1, max_id_hw = 2
id = 5, max_id_ref = 1, max_id_sw = 2
id = 6, max_id_ref = 1, max_id_hw = 2
id = 6, max_id_ref = 1, max_id_sw = 2
id = 10, max_id_ref = 1, max_id_hw = 2
id = 10, max_id_ref = 1, max_id_sw = 2
id = 15, max_id_ref = 1, max_id_sw = 2
id = 34, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_hw = 0
id = 39, max_id_ref = 1, max_id_sw = 0
id = 80, max_id_ref = 0, max_id_sw = 2
id = 85, max_id_ref = 0, max_id_sw = 2
id = 86, max_id_ref = 0, max_id_sw = 2
id = 90, max_id_ref = 0, max_id_sw = 2
id = 91, max_id_ref = 0, max_id_sw = 2
id = 95, max_id_ref = 0, max_id_sw = 2
id = 96, max_id_ref = 0, max_id_sw = 2
id = 97, max_id_ref = 0, max_id_sw = 2
id = 98, max_id_ref = 0, max_id_sw = 2
id = 99, max_id_ref = 0, max_id_sw = 2
id = 102, max_id_ref = 2, max_id_hw = 0
id = 107, max_id_ref = 2, max_id_hw = 0
id = 126, max_id_ref = 2, max_id_hw = 0
id = 127, max_id_ref = 2, max_id_hw = 0
id = 128, max_id_ref = 2, max_id_hw = 0
id = 129, max_id_ref = 2, max_id_hw = 0
id = 131, max_id_ref = 2, max_id_hw = 0
id = 132, max_id_ref = 2, max_id_hw = 0
id = 133, max_id_ref = 2, max_id_hw = 0
id = 134, max_id_ref = 2, max_id_hw = 0
id = 136, max_id_ref = 2, max_id_hw = 0
id = 137, max_id_ref = 2, max_id_hw = 0
id = 138, max_id_ref = 2, max_id_hw = 0
id = 139, max_id_ref = 2, max_id_hw = 0
id = 142, max_id_ref = 2, max_id_hw = 0
id = 143, max_id_ref = 2, max_id_hw = 0
id = 144, max_id_ref = 2, max_id_hw = 0
id = 147, max_id_ref = 2, max_id_hw = 0
id = 148, max_id_ref = 2, max_id_hw = 0
id = 149, max_id_ref = 2, max_id_hw = 0
hw_err_count = 27
sw_err_count = 17


ハードウェアのエラーの個数は 27 個だった。
ここから演算のビット幅を増やしたときのエラーの減少は少なくなっているので、実験をやめることにした。
結局、演算のビット幅を 3 ビット増やした時でやってみることにした。ap_fixedの宣言は以下の通り。

ap_fixed<13, 6, AP_TRN_ZERO, AP_SAT> conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
ap_fixed<13, 6, AP_TRN_ZERO, AP_SAT> pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
ap_fixed<16, 7, AP_TRN_ZERO, AP_SAT> dot1[100];
ap_fixed<16, 7, AP_TRN_ZERO, AP_SAT> dot2[NUM_OF_OUTPUT];


ここでの精度を計算してみよう。
ハードウェアのエラーの個数は 28 個なので、(150 - 28) / 150 x 100 ≒ 81.3 %
ソフトウェアのエラーの個数は 17 個なので、(150 - 17) / 150 x 100 ≒ 88.7 %
となった。

最後に straight_conv_nn_tb.cpp を貼っておく。

// straight_conv_nn_tb.cpp
// 2017/08/28 by marsee
// 畳み込み層のカーネル数 2
//

#include <stdio.h>
#include <ap_fixed.h>

#include "conv1_weight.h"
#include "conv1_bias.h"
#include "af1_weight.h"
#include "af1_bias.h"
#include "af2_weight.h"
#include "af2_bias.h"

#include "straight_data.h"

#define NUM_OF_KERNELS 2
#define COULMN_PIXELS 56
#define ROW_PIXELS 10
#define ALL_PIXELS 560
#define NUM_OF_OUTPUT 3

int straight_conv_nn(ap_ufixed<80, AP_TRN_ZERO, AP_SAT> in[ALL_PIXELS], ap_fixed<137, AP_TRN_ZERO, AP_SAT> out[NUM_OF_OUTPUT]);
int straight_conv_nn_float(float in[ALL_PIXELS], float out[NUM_OF_OUTPUT]);
int max_ap_fixed(ap_fixed<137, AP_TRN_ZERO, AP_SAT> out[NUM_OF_OUTPUT]);
int max_float(float out[NUM_OF_OUTPUT]);

#define NUM_ITERATIONS    150 // C Simulation
//#define NUM_ITERATIONS    2 // C/RTL CoSimulation

int main(){
    float t_tran_float[NUM_ITERATIONS][ALL_PIXELS];
    ap_fixed<137, AP_TRN_ZERO, AP_SAT> result_ap_fixed[NUM_ITERATIONS][NUM_OF_OUTPUT];
    float result_float[NUM_ITERATIONS][NUM_OF_OUTPUT];
    int max_id_hw, max_id_sw, max_id_ref;

    for(int i=0; i<NUM_ITERATIONS; i++)
        for(int j=0; j<ALL_PIXELS; j++)
            t_tran_float[i][j] = (float)t_train[i][j];

    for(int i=0; i<NUM_ITERATIONS; i++){
        straight_conv_nn(&t_train[i][0], &result_ap_fixed[i][0]);
        straight_conv_nn_float(&t_tran_float[i][0], &result_float[i][0]);
    }

    int hw_err_count=0;
    int sw_err_count=0;
    for(int i=0; i<NUM_ITERATIONS; i++){
        max_id_hw = max_ap_fixed(&result_ap_fixed[i][0]);
        max_id_sw = max_float(&result_float[i][0]);
        max_id_ref = max_float(&t_test[i][0]);

        if(max_id_ref != max_id_hw){
            printf("id = %d, max_id_ref = %d, max_id_hw = %d\n", i, max_id_ref, max_id_hw);
            hw_err_count++;
        }
        if(max_id_ref != max_id_sw){
            printf("id = %d, max_id_ref = %d, max_id_sw = %d\n", i, max_id_ref, max_id_sw);
            sw_err_count++;
        }
    }
    if(hw_err_count==0 && sw_err_count==0)
        printf("No Error\n");
    else{
        printf("hw_err_count = %d\n", hw_err_count);
        printf("sw_err_count = %d\n", sw_err_count);
    }

    return(0);
}

int straight_conv_nn_float(float in[ALL_PIXELS], float out[NUM_OF_OUTPUT]){
    float buf[ROW_PIXELS][COULMN_PIXELS];
    float conv_out[NUM_OF_KERNELS][ROW_PIXELS-4][COULMN_PIXELS-4];
    float pool_out[NUM_OF_KERNELS][(ROW_PIXELS-4)/2][(COULMN_PIXELS-4)/2];
    float dot1[100];
    float dot2[NUM_OF_OUTPUT];

    buf_copy1: for(int i=0; i<ROW_PIXELS; i++)
        buf_copy2: for(int j=0; j<COULMN_PIXELS; j++)
            buf[i][j] = in[i*COULMN_PIXELS+j];

    // Convolutional Neural Network 5x5 kernel, Stride = 1, Padding = 0
    // + ReLU
    CONV1: for(int i=0; i<NUM_OF_KERNELS; i++){    // カーネルの個数
        CONV2: for(int j=0; j<ROW_PIXELS-4; j++){
            CONV3: for(int k=0; k<COULMN_PIXELS-4; k++){
                conv_out[i][j][k] = 0;
                CONV4: for(int m=0; m<5; m++){
                    CONV5: for(int n=0; n<5; n++){
                        conv_out[i][j][k] += buf[j+m][k+n] * conv1_fweight[i][0][m][n];
                    }
                }
                conv_out[i][j][k] += conv1_fbias[i];

                if(conv_out[i][j][k]<0)    // ReLU
                    conv_out[i][j][k] = 0;
            }
        }
    }

    // Pooling Kernel = 2 x 2, Stride = 2
    POOL1: for(int i=0; i<NUM_OF_KERNELS; i++){
        POOL2: for(int j=0; j<ROW_PIXELS-4; j += 2){
            POOL3: for(int k=0; k<COULMN_PIXELS-4; k += 2){
                POOL4: for(int m=0; m<2; m++){
                    POOL5: for(int n=0; n<2; n++){
                        if(m==0 && n==0){
                            pool_out[i][j/2][k/2] = conv_out[i][j][k];
                        } else if(pool_out[i][j/2][k/2] < conv_out[i][j+m][k+n]){
                            pool_out[i][j/2][k/2] = conv_out[i][j+m][k+n];
                        }
                    }
                }
            }
        }
    }

    af1_dot1: for(int col=0; col<100; col++){
        dot1[col] = 0;
        af1_dot2: for(int i=0; i<NUM_OF_KERNELS; i++){
            af1_dot3: for(int j=0; j<(ROW_PIXELS-4)/2; j++){
                af1_dot4: for(int k=0; k<(COULMN_PIXELS-4)/2; k++){
                    dot1[col] += pool_out[i][j][k]*af1_fweight[i*((ROW_PIXELS-4)/2)*((COULMN_PIXELS-4)/2)+j*((COULMN_PIXELS-4)/2)+k][col];
                }
            }
        }
        dot1[col] += af1_fbias[col];

        if(dot1[col] < 0)    // ReLU
            dot1[col] = 0;
    }

    af2_dot1: for(int col=0; col<NUM_OF_OUTPUT; col++){
        dot2[col] = 0;
        af2_dot2: for(int row=0; row<100; row++){
            dot2[col] += dot1[row]*af2_fweight[row][col];
        }
        dot2[col] += af2_fbias[col];

        out[col] = dot2[col];
    }

    return(0);
}

int max_ap_fixed(ap_fixed<137, AP_TRN_ZERO, AP_SAT> out[NUM_OF_OUTPUT]){
    int max_id;
    ap_fixed<137, AP_TRN_ZERO, AP_SAT> max;

    for(int i=0; i<NUM_OF_OUTPUT; i++){
        if(i == 0){
            max = out[0];
            max_id = 0;
        }else if(out[i]>max){
            max = out[i];
            max_id = i;
        }
    }
    return(max_id);
}

int max_float(float out[NUM_OF_OUTPUT]){
    int max_id;
    float max;

    for(int i=0; i<NUM_OF_OUTPUT; i++){
        if(i == 0){
            max = out[0];
            max_id = 0;
        }else if(out[i]>max){
            max = out[i];
            max_id = i;
        }
    }
    return(max_id);
}

  1. 2017年09月12日 04:21 |
  2. DNN
  3. | トラックバック:0
  4. | コメント:0

コメント

コメントの投稿


管理者にだけ表示を許可する

トラックバック URL
http://marsee101.blog19.fc2.com/tb.php/3912-9a0a52c5
この記事にトラックバックする(FC2ブログユーザー)