![]() Even if we are allowed to place different data together, we are certainly not able to analyse them when local identities of patterns are required to be retained. We may encounter sensitive data originating from different sources - those cannot be amalgamated. In these cases, data mining becomes more challenging for several essential reasons. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Pattern recognition in data is a well known classical problem that falls under the ambit of data analysis. The PICRĭata analysis and pattern recognition in multiple databases Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV or Microsoft Excel (XLS files suitable for use in a local database or a spreadsheet. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Results We have created the Protein Identifier Cross-Reference (PICR service, a web application that provides interactive and programmatic (SOAP and REST access to a mapping algorithm that uses the UniProt Archive (UniParc as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. ![]() Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. ![]() This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. The Protein Identifier Cross-Referencing (PICR service: reconciling protein identifiers across multiple source databasesĭirectory of Open Access Journals (Sweden)įull Text Available Abstract Background Each major protein database uses its own conventions when assigning protein identifiers. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |