In recent years, in order to reduce semantic gap which exists between humanity understanding high-level semnatics and low-level features of the video, people mostly try the method of video annotation where in signal's downstream, namely further (again) labels to the content in video-database. few people focus on the idea which uses limited interaction and means of comprehensive segmentation (including optical technologies), from the front-end of collection of video information (i.e. video camera), with video semantic analysis technology and corresponding sets of main concepts (i.e. ontology) in different domains, as well as scene shooting script and shooting task description etc, apply different-level semantic description to enrich the attributes of object and region, then forms a new video model which is based on video object plan (VOP) coding. this model has the potential intellectualized features, and carries large amount of metadata information, and ambushes intermediate-level semantic concept. this paper focuses on the latter, and proposes a new framework of video model that is temporarily named "semantic-preload video model (SPVM)". this model researches for video object which usually used intermediate semantic labeling, is from signal's upstream (i.e. video capture production stage). because of the research needs, this paper also tries to analyses the hierarchic structure of video, and divides the semantic which is involved in the video production process into nine hierarchy semantic levels, and pointes out these semantic level tagging work (i.e. semantic preloading) only refers to the semantic in the four middle-level, which is the goal of this model to achieve.