BioTorrents :: Details for "Pfam hmm release 24"
You are using the guest account. Please Log In or Sign Up to enjoy all features of BioTorrents!

Pfam hmm release 24

Share |

First time downloading? See FAQ:Download for help.

DownloadPfam_hmm_release_24.torrentBitLet.org
Info hash44defd014fdd22243e5e83323226d3108df2634f
Alternative Versions(None Selected)
Description
This contains *only* the Pfam-A.hmm and Pfam-B.hmm files from Pfam release 24.
If running Hmmer3 these are the only files you need.

Instructions for use with HMMER
-gunzip Pfam-A*
-hmmpress Pfam-A.hmm
-hmmpress Pfam-B.hmm
-hmmscan Pfam-A.hmm your_fasta_file.faa

Files were downloaded from:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam24.0/

Release notes from Pfam FTP site:

PFAM : Multiple alignments and profile HMMs of protein domains
    RELEASE 24.0
--------------------------------------

1. INTRODUCTION

  Pfam is a collection of protein family alignments which were
  constructed semi-automatically using hidden Markov models (HMMs).
  Sequences that are not covered by Pfam are clustered and aligned
  automatically, and released as Pfam-B.  Pfam families have
  permanent accession numbers and contain functional annotation and
  cross-references to other databases, while Pfam-B families are
  re-generated at each release and are unannotated.


2. LOCATIONS

  Pfam is available on the web at:
 
http://pfam.sanger.ac.uk/
http://pfam.cgb.ki.se/
http://pfam.janelia.org/
http://pfam.jouy.inra.fr/
http://pfam.ccbb.re.kr/

3. STATISTICS

                           Pfam                         Pfam-B
                   -----------------------      -----------------------
 Release   Date  families sequences  residues   families sequences  residues   Source
 -------  -----  -------- --------- ----------  -------- --------- ---------  ---------
 
   0.2    01/96       100     10431    2246421     11763     32081   9200334  Swiss 32
   1.0    04/96       175     15610    3560959     11929     31931   8957230  Swiss 33
   2.0    03/97       527     28170    6770529     13289     31349   8224614  Swiss 34
   2.1    10/97       527     28205    6790960     13289     31349   8224614  Swiss 34
   3.0    06/98       806     99043   22766133     33550     79544  20648530  Swiss 35    + SP-TrEMBL 5
   3.1    09/98      1313    114750   27573470     33550     79544  20648530  Swiss 35    + SP-TrEMBL 5
   3.2    10/98      1344    115155   27689081     33550     79544  20648530  Swiss 35    + SP-TrEMBL 5
   3.3    12/98      1390    119420   28085438     33550     79544  20648530  Swiss 35    + SP-TrEMBL 5
   3.4    01/99      1407    119963   28343136     33550     79544  20648530  Swiss 35    + SP-TrEMBL 5
   4.0    05/99      1465    147347   34476183    128689    123610  33470292  Swiss 37    + SP-TrEMBL 9
   4.1    07/99      1488    148195   34692597     36739     89640  22510097  Swiss 37    + SP-TrEMBL 9
   4.2    08/99      1664    155979   36683193     40017     99587  24062200  Swiss 37    + SP-TrEMBL 9
   4.3    09/99      1815    161833   37803491     39506     97492  23115975  Swiss 37    + SP-TrEMBL 9
   4.4    11/99      2000    164412   38411490     39200     96055  22552453  Swiss 37    + SP-TrEMBL 9
   5.0    01/00      2008    178110   41516321     39228     96077  22506088  Swiss 38    + SP-TrEMBL 11
   5.1    02/00      2015    179782   41704446     42357    103709  24762358  Swiss 38    + SP-TrEMBL 11
   5.2   03/00      2128    181068   42018555    42163    102843  24471000  Swiss 38    + SP-TrEMBL 11
   5.3   05/00      2216    183695   42512479    41974    102024  23952537  Swiss 38    + SP-TrEMBL 11
   5.4    06/00      2290    185251   42659663     41885    101728  23774015  Swiss 38    + SP-TrEMBL 11
   5.5    09/00      2478    190302   43837632     41232     99302  22716640  Swiss 38    + SP-TrEMBL 11
   6.0    01/01      2697    258321   59332756     40681     96571  21789591  Swiss 39    + SP-TrEMBL 14
   6.1    03/01      2727    260202   59586847     40230     96128  21545422  Swiss 39    + SP-TrEMBL 14
   6.2    04/01      2773    260570   59749821     58924    131100  30096444  Swiss 39    + SP-TrEMBL 14
   6.3    05/01      2847    261546   60210817     58539    130194  29602298  Swiss 39    + SP-TrEMBL 14
   6.4    05/01      2866    262071   60508404     58297    129257  29370968  Swiss 39    + SP-TrEMBL 14
   6.5    06/01      2929    264276   61192015     57891    127861  28892585  Swiss 39    + SP-TrEMBL 14
   6.6    08/01      3071    267598   61976627     57477    126378  28143196  Swiss 39    + SP-TrEMBL 14
   7.0    01/02      3360    409136   93681071     78233    179966  39684197  Swiss 40    + SP-TrEMBL 18
   7.1   03/02      3621    413112   94740523    77733    178097  38784346  Swiss 40    + SP-TrEMBL 18
   7.2    04/02      3735    417711   95253221     77408    177141  38445765  Swiss 40    + SP-TrEMBL 18
   7.3    05/02      3849    419102   95460308     83108    193509  41854966  Swiss 40    + SP-TrEMBL 18
   7.4    07/02      3882    419360   95369593     83166    193462  41907032  Swiss 40    + SP-TrEMBL 18
   7.5    08/02      4176    424307   96575650     82343    190360  40849796  Swiss 40    + SP-TrEMBL 18
   7.6    09/02      4463    428237   97335169     81669    187725  40073352  Swiss 40    + SP-TrEMBL 18
   7.7    10/02      4832    435353   99654093     80001    182610  38092326  Swiss 40    + SP-TrEMBL 18
   7.8    11/02      5049    437970  100096164     79427    181013  37469930  Swiss 40    + SP-TrEMBL 18
   8.0    02/03      5193    626452  142233861     76370    174607  35686841  Swiss 40.31 + SP-TrEMBL 22.0
   9.0    05/03      5722    705698  160653012     98158    232641  46847522  Swiss 41.0  + SP-TrEMBL 23.0
  10.0    07/03      6190    733829  167492698     96550    227325  45381141  Swiss 41.10 + SP-TrEMBL 23.15
  11.0    10/03      7255    805978  184093744     94757    220445  43457201  Swiss 41.25 + SP-TrEMBL 24.14
  12.0    01/04      7316    898590  205356517    108951    262041  53933483  Swiss 42.5  + SP-TrEMBL 25.6
  13.0    04/04      7426    899152  205367676    108119    260253  53251778  Swiss 42.12 + SP-TrEMBL 25.12
  14.0    05/04      7459    903115  206720690    107460    259106  52378429  Swiss 43.2  + SP-TrEMBL 26.2
  15.0    08/04      7503   1105589  251318620    140216    357340  70848731  Swiss 44.0  + SP-TrEMBL 27.0
  16.0    10/04      7677   1164599  264667462    139134    353883  69591421  Swiss 44.5  + SP-TrEMBL 27.5
  17.0    03/05      7868   1321755  297821068    129746    336353  63302856  Swiss 46.0  + SP-TrEMBL 29.0
  18.0    07/05      7973   1426410  322176782    128469    327279  61569471  Swiss 47.0  + SP-TrEMBL 30.0
  19.0    11/05      8183   1728628  391195934    127296    322743  60022455  Swiss 48.1  + SP-TrEMBL 31.1
  20.0    04/06      8296   2062824  468403714    126439    319735  59164603  Swiss 48.9  + SP-TrEMBL 31.9
  21.0    11/06      8957   2343023  532701643    186970    403510  71601041  Swiss 50.0  + SP-TrEMBL 33.0
  22.0    06/07      9318   2990695  679928271    182493    472700  87434067  Swiss 51.7  + SP-TrEMBL 34.7
  23.0    07/08     10340   3925943  890618067    223403   1029669 215585168  Swiss 54.5  + SP-TrEMBL 37.5
  24.0    07/09     11912   7079739 1627712293    142303    940849 171679060  Swiss 57.6  + SP-TrEMBL 40.6


                           NCBI pfam            
                   -----------------------      
 Release   Date  families sequences  residues  seq coverage res coverage Source
 -------  -----  -------- --------- ---------- ------------ ------------ ------
  23.0    07/08     10305   8334719 1674388721       66.39%      50.93%  rel162
  24.0    08/09     11883  11223814 2279289638       69.51%      53.60%  rel172

                            Metaseq pfam
                   -----------------------
 Release   Date  families sequences   residues seq coverage res coverage  
 -------  -----  -------- --------- ---------- ------------ ------------
  23.0    07/08      6438    3055905 448190722       46.21%       33.47%
  24.0    07/08      7841    4388839 705254062       66.37%       53.66%

4. CONSTRUCTION OF PFAM

  Pfam is based on a sequence database called Pfamseq - Pfamseq 24 is
  based on UniProt 15.6 (Swiss-Prot 57.6 and SP-TrEMBL 40.6).  These databases
  can be accessed at:

     ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/
     ftp://ftp.ebi.ac.uk/pub/databases/trembl/

  Pfamseq 24 contains 9421015 sequences and 3060696203 residues.

  Metaseq is a collection of metagenomic sequence sample sets that contains
  6612632 sequences and 1314268346 residues.

  NCBI non redundant sequence database release 172 contains 16146658
  sequences and 4252521171 residues.

  Pfam-B has been constructed using ADDA.


5. DESCRIPTION OF CHANGES FROM RELEASE 23.0 to 24.0

  Release 24.0 contains a total of 11912 families, with 1808 new
  families and 236 families killed since the last release.  75.15% of
  all proteins in Pfamseq contain a match to at least one Pfam
  domain.  53.18% of all residues in the sequence database fall within
  Pfam domains.

  Release 24.0 is the first release to be generated using HMMER3.
  This migration has necessitated changes in our file formats and
  users should be aware that HMMER3 is not backward compatible with
  HMMER2.  For a full list of changes please view the documentation
  on the Pfam website.
 
6. FUTURE FORMAT CHANGES

  No major changes for the format of the flatfile planned for next
  release.

  The swisspfam file will be depracated from the next release.


7. DESCRIPTION OF RELEASE FILES

  relnotes.txt      - This file.
  userman.txt       - A fuller description of Pfam fields.
  Pfam-A.hmm        - Pfam-A HMMs in an HMM library searchable with the hmmscan program.
  Pfam-A.seed       - Annotation and seed alignments of all Pfam-A families in Pfam format.
  Pfam-A.full       - Annotation and full alignments of all Pfam-A families in Pfam format.
  Pfam-A.full.ncbi  - Annotation and full alignments of all Pfam-A families against NCBI genpept database.
  Pfam-A.full.meta  - Annotation and full alignments of all Pfam-A families against metagenomic datasets.
  Pfam-A.fasta     - A list of sequences in each Pfam-A family in fasta format.
  Pfam-A.dead     - All Pfam-A families that have been removed from the database.
  Pfam-B.hmm        - The first (and largest) 20,000 Pfam-B HMMs in an HMM library searchable with the hmmscan program.
  Pfam-B            - All Pfam-B families.
  Pfam-C            - A list of all the clans, containing annotation and lists of Pfam-A entries in the clan
  diff              - A list of files for each family that have changed since the last release.
  pfamseq     - The underlying sequence database in fasta format.
  ncbiseq           - The NCBI genpept database in fasta format.
  metaseq           - The metagenomics sequences in fasta format.
  swisspfam         - Pfam domain organisation of all proteins in Pfamseq.




8. DESCRIPTION OF FIELDS

  Compulsory fields:
  ------------------

  AC   Accession number:           Accession number in form PFxxxxx.version or PBxxxxxx.
  ID   Identification:             One word name for family.
  DE   Definition:                 Short description of family.
  AU   Author:                     Authors of the entry.
  SE   Source of seed:             The source suggesting the seed members belong to one family.
  GA   Gathering method:           Search threshold to build the full alignment.
  TC   Trusted Cutoff:             Lowest sequence score and domain score of match in the full alignment.
  NC   Noise Cutoff:               Highest sequence score and domain score of match not in full alignment.
  TP   Type:                       Type of family -- presently Family, Domain, Motif or Repeat.
  SQ   Sequence:                   Number of sequences in alignment.
  //                               End of alignment.

  Optional fields:
  ----------------

  DC   Database Comment:           Comment about database reference.
  DR   Database Reference:         Reference to external database.
  RC   Reference Comment:          Comment about literature reference.
  RN   Reference Number:           Reference Number.
  RM   Reference Medline:          Eight digit medline UI number.
  RT   Reference Title:            Reference Title.
  RA   Reference Author:           Reference Author
  RL   Reference Location:         Journal location.
  PI   Previous identifier:        Record of all previous ID lines.
  KW   Keywords:                   Keywords.
  CC   Comment:                    Comments.
  NE   Pfam accession:    Indicates a nested domain.
  NL   Location:                   Location of nested domains - sequence ID, start and end of insert.

  Obsolete fields:
  -----------
  AL   Alignment method of seed:   The method used to align the seed members.
  AM   Alignment Method:    The order ls and fs hits are aligned to the model to build the full align.

9. ACKNOWLEDGEMENTS
 
  We are grateful to the many people who contributed data:
  L. Aravind, Laurence Etwiller, Matthew Bashton, Peer Bork, Richard
  Copley, Tim Dudgeon, Anton Enright, Nicola Kerrison,  Nina Mian,
  William Mifsud, Chris Ponting, Joerg Schultz, Val Wood, David Waterfield,
  Simon Moxon, Dan Haft, Owen White, Matthew Fenech, Stephen
  Sammut, Joanne Pollington and O. Luke Gavin as well as many others.
 

10. REFERENCES

  Papers on Pfam are listed below:

  i)    Sonnhammer ELL, Eddy SR, Durbin R. Proteins: Structure,
        Function and Genetics 28:405-420 (1997).

  ii)   Sonnhammer ELL, Eddy SR, Birney E, Bateman A, Durbin R.
        Nucleic Acids Research 26:320-322 (1998).

  iii)  Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer ELL.
        Nucleic Acids Research 27:260-262 (1999).

  iv)   Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer ELL.
Nucleic Acids Research 28:263-266 (2000).

  v)    Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR,
Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer ELL.
Nucleic Acids Res. 30:276-280 (2002).

  vi)   Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S,
        Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C,
        Eddy SR.
Nucleic Acids Res. 32(1):D138-41 (2004).

  vii)  Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T,
        Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer
        EL, Bateman A.
Nucleic Acids Res. 34(Database issue):D247-251 (2006).

  viii) Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K,
        Eddy SR, Sonnhammer EL, Bateman A.
        Nucleic Acids Res. 36(Database issue):D281-288 (2008).

  Please reference the most recent paper.


11. THE PFAM CONSORTIUM

  Pfam is maintained by a consortium of researchers. You can contact
  the Pfam consortium at:
      pfam-help  "at"  sanger.ac.uk

  The current members of the Pfam consortium are:

  Alex Bateman, Penny Coggill, Richard Durbin, Robert Finn, Jaina Mistry, John Tate:
  The Wellcome Trust Sanger Institute, UK.

  Lorenzo Cerrutti: ISREC, Switzerland.

  Andreas Heger: University of Oxford, UK.

  Erik Sonnhammer, Kristoffer Forslund: Stockholm Bioinformatics Centre, Sweden

  Sean Eddy, Goran Ceric: Janelia Farm Research Campus, USA
 

12. COPYRIGHT NOTICE

  Pfam - A database of protein domain family alignments and HMMs
  Copyright (C) 1996-2009 The Pfam consortium.

  This database is free; you can redistribute it and/or modify it
  under the terms of the GNU Library General Public License as
  published by the Free Software Foundation; either version 2 of the
  License, or (at your option) any later version.

  In summary, you are free to redistribute *verbatim* copies of Pfam
  or any Pfam files in any way you like, including packaging Pfam in
  proprietary software, so long as your copy of Pfam retains our
  copyright notice and the GNU license. You may also make *modified*
  copies of Pfam and distribute them, but your derivative database
  must be freely distributed under the GNU LGPL. Many academic
  freeware licenses prohibit any form of commercial use. In contrast,
  the intent of our license is that Pfam should be freely available
  to both industrial and academic researchers, including the use of
  the Pfam database in commercial software; however, proprietary
  modifications of the Pfam database itself are
  prohibited. Proprietary modification of the Pfam database is
  possible only by a separate formal licensing agreement from the
  Pfam consortium and our host institutions. See the file GNULICENSE
  for the full text of the GNU Library General Public License.

  This database is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Library General Public License for more details.

  You may also obtain a copy of the GNU LGPL by writing to the Free
  Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
  02111-1307, USA.

___________________
The Pfam Consortium
2009
Visibleno (dead)
CategoryGenomics
License(None Selected)
Last seederLast activity 2 days ago
Size512.38 MB (537,265,074 bytes)
AddedNov 19 2009, 01:07 PM
Views1274
Hits468
Downloaded43 time(s)
Upped bymlangill
Num files
[See full list]
4 files
Peers
[See full list]
0 seeder(s), 1 leecher(s) = 1 peer(s) total

Comments for Pfam hmm release 24

No comments yet

Add a comment