[Project Retrospective Report V3.0] Independent Model Conversion: Overcoming the Technical Challenges of Converting Dynamic ONNX Models to RKNN (Final Version)
| Project Attribute | Details |
|---|---|
| Project Name | Part 3: Independent RKNN Model Conversion for Sherpa ASR on the RV1126BP Platform |
| Project Timeline | 2025-07-31 ~ 2025-08-09 (Estimated) |
| Project Duration | 31.5 hours (Approx. 3.9 person-days) |
| Review Date | 2025-08-20 |
| Core Personnel | Potter White |
1. Project Background and Kick-off
In the V2.0 project, we verified that the sherpa-onnx framework could not directly use NPU acceleration due to the outdated RKNN SDK version on the target platform. At the same time, the pure CPU solution with sherpa-ncnn had already reached its performance bottleneck. Therefore, to fully leverage the NPU hardware of the RV1126b platform, the core objective of this project (V3.0) was to bypass the compilation limitations of sherpa-onnx and independently complete the conversion from ONNX models to RKNN models.
The starting point for this work was to address the dynamic input dimension issue prevalent in Sherpa ASR models.
2. Foundational Problem Analysis and General Conversion Strategy
2.1 Understanding Dynamic Dimension [N, ...]
In ONNX models, the dimension N typically serves as a placeholder for a dynamic batch size. For example, an input shape of [N, 39, 80] means the model can process N input samples at once. However, to achieve the most efficient computation on embedded NPUs, the RKNN toolchain usually requires the model’s input shape to be fixed and static.
2.2 Establishing a General Technical Workflow
Based on this analysis, I established a general three-step strategy for model conversion:
- Statically Reshape the ONNX Model: Write a script to fix all dynamic dimensions
Nin the original ONNX model to a specific value, typically1, to indicate single-input processing. - Extract Model Metadata: Analyze and extract the unique
custom_stringmetadata from thesherpa-onnxmodel. This data contains essential parameters for inference, such asvocab_size, which need to be provided to the RKNN toolchain during conversion. - Perform the Conversion: Use
rknn-toolkit2to convert the statically reshaped ONNX model.
This general workflow is effective for model components with relatively simple structures.
3. Standard Conversion Process for encoder and joiner Models
The encoder and joiner models do not contain complex dynamic control flow operators internally. Therefore, they could strictly follow the general conversion strategy outlined above. The automated script I designed for this purpose executes as follows:
graph TD
subgraph "User Action"
A[Execute conversion script: ./convert.py encoder]
end
subgraph "Automated Script Execution Flow"
B(1. Load original encoder.onnx) --> C{2. Check input dimensions};
C -- Dynamic dimension 'N' found --> D[3. Change 'N' to 1];
D --> E[4. Save as encoder_fixed.onnx];
E --> F(5. Load original onnx to get metadata);
F --> G[6. Construct custom_string];
G & E --> H(7. Call rknn-toolkit2);
H --> I[8. Generate encoder.rknn];
end
A --> B
I --> J((Success))Workflow Description:
- The script first loads the original
encoder.onnxmodel. - It programmatically inspects the dimensions of its input tensors to identify the dynamic dimension
N. Nis then modified to the static value1, generating an intermediate_fixed.onnxfile.- Concurrently, the script reads the original model’s metadata using
onnxruntimeand formats it into thecustom_stringrequired by RKNN. - Finally, the statically reshaped model and the metadata string are fed into
rknn-toolkit2, successfully completing the conversion and generating the finalencoder.rknnfile.
The conversion process for the joiner model is identical to that of the encoder. However, when this standard process was applied to the decoder model, it resulted in a critical conversion failure.
4. Special Challenges and In-depth Debugging of the decoder Model
The decoder model, due to its internal dynamic control flow (an If operator) based on input shapes, caused the standard conversion process to fail. It required additional, more complex preprocessing.
4.1 Failure of Initial Attempts and Problem Identification
I first attempted two direct methods, both of which failed, but these failures helped me precisely identify the root cause of the problem.
- Attempt 1 - Direct Conversion of Dynamic Model:
rknn-toolkit2failed immediately during the model loading phase, explicitly stating that it does not support the dynamic input dimensionN. - Attempt 2 - Specifying Input Size During Conversion: By forcing the input size through a parameter in
rknn.load_onnx, the model loaded successfully. However, it failed during the build phase (rknn.build) with the error:All outputs ['decoder_out'] of model are constants.
4.2 The Core Problem: Logical Failure Caused by Constant Folding Optimization
To resolve the core error, “All outputs are constants,” I shifted my focus to preprocessing the ONNX model itself.
- Action: I first executed the initial step of the standard process, fixing the dynamic input dimension
Nofdecoder.onnxto1and generatingdecoder_fixed.onnx. - Observation: When I used this static model for conversion, it reproduced the exact same error as in “Attempt 2.”
- Root Cause Analysis:
- Dynamic Control Flow: By analyzing the
decodermodel with the Netron visualization tool, I discovered anIfoperator inside it. The condition for thisIfoperator depended on the shape of an intermediate tensor calculated by precedingShapeandGatheroperators. - Logic Solidification: In the original dynamic model, this shape was variable, making the path of the
Ifbranch non-deterministic. However, once I fixed the input dimensionNto1, this shape-dependent condition also became a constant (alwaysTrueor alwaysFalse). - Over-Optimization: During its
buildprocess, the RKNN toolchain performs an optimization called “constant folding” (fold_constant). When it detected that theIfoperator’s condition was now constant, it “intelligently” pruned the branch that would never be executed. In thedecodermodel, this pruning triggered a chain reaction, causing an entire computation path fromShape_7toGemm_15to be removed. Ultimately, this led to the model’s output,decoder_out, being incorrectly identified as a constant value independent of the input, thus throwing the “invalid model” error.
- Dynamic Control Flow: By analyzing the
4.3 The Final Solution: Introducing a Professional Model Simplification Tool
The root of the problem was how to handle the now-static If operator. After failed attempts at disabling optimizations and manually modifying the computation graph, I identified the final solution.
- Final Strategy: Add a crucial “model simplification” step to the standard workflow. I introduced the professional Python library
onnx-simplifier. - The Role of
onnx-simplifier: This tool correctly performs constant folding. It automatically evaluates theIfoperator’s condition, safely prunes the static branch, and, most importantly, ensures that the resulting simplified ONNX model is topologically valid.
4.4 Final Conversion Workflow for the decoder Model
Based on the above analysis, I designed a specialized and more robust four-step conversion process for the decoder model and codified it into the automation script.
graph TD
subgraph "User Action"
A[Execute conversion script: ./convert.py decoder]
end
subgraph "Automated Script Execution Flow (Special Handling for Decoder)"
B(1. Load original decoder.onnx) --> C{2. Check input dimensions};
C -- Dynamic dimension 'N' found --> D[3. Change 'N' to 1];
D --> E[4. Save as decoder_fixed.onnx];
E --> F(5. **Call onnx-simplifier**);
F --> G[6. **Prune static branches & reorder graph**];
G --> H[7. Save as decoder_simplified.onnx];
H --> I(8. Load original onnx to get metadata);
I --> J[9. Construct custom_string];
J & H --> K(10. Call rknn-toolkit2);
K --> L[11. Generate decoder.rknn];
end
A --> B
L --> M((Success))Workflow Description:
Compared to the standard process for encoder, the decoder’s workflow adds a critical step at step 5: using onnx-simplifier to deeply optimize the statically reshaped _fixed.onnx model. This step not only safely removed the problematic If operator but also ensured that the output _simplified.onnx was a legitimate model with a complete and topologically correct computation graph. Finally, this perfectly “sanitized” model could be converted by rknn-toolkit2 without any issues.
5. Project Outcomes and Conclusion
5.1 Key Achievements
- Successfully established a conversion pipeline for complex ONNX models with dynamic control flow to RKNN models.
- Created a differentiated, standardized model conversion process: a standard workflow for simple models and an enhanced workflow with an extra simplification step for complex models (like
decoder). - Produced a reusable, automated model conversion script (
unified_onnx_to_rknn_converter.py). This script can intelligently identify the model type and automatically apply the corresponding conversion workflow, enabling “one-click” conversion for all model components.
5.2 Conclusion
This technical effort demonstrates that when converting models with dynamic control flow (like If operators) for embedded platforms, the core challenge lies in bridging the gap between a dynamic model and static hardware requirements. Simply fixing input dimensions often breaks the internal computation logic of the model, causing the conversion tool’s optimization process to make incorrect judgments. In such cases, introducing a professional, validated model optimization tool (like onnx-simplifier) to preprocess and sanitize the model before feeding it into the hardware vendor’s toolchain is a more reliable and efficient solution than attempting to manually modify the computation graph or tweak conversion tool parameters.
The success of this phase lays a solid foundation for the subsequent development of a high-performance, NPU-based inference engine on the RV1126b platform.