Text this: An Ontology-Based Framework for Complex Urban Object Recognition through Integrating Visual Features and Interpretable Semantics