data integration provides a univorm access to multiple sources that are maintained by different organizations, located geographically distributed; and managed autonomously. the available sources (a) large in number and growing on (b) heterogeneous, and (c) overlapping, replicated, or disjoint. an important issue is to identify useful and sematically relevant data sources. in this paper, we present a methodology to identify relevant sources and rank them for selection in data integration basis on semantic similarity and data quality the methodology is a preliminary step in LAV approach for pruning irrelevant heterogenenous and low quality sources in a large and dynamic data integration environment.