Bug in PQR parser #1855

VonBoss · 2024-03-29T16:05:34Z

Summary

The Prody PQR parser fails to parse PQR files that do not have white space between their data fields. I have encountered this issue when the XYZ coordinates have 3 digits in front of the decimal place and are negative.

Example: lines from a file that failed to parse

ATOM      1    C STP     1     -26.417  62.269-123.283    0.00     3.30
ATOM      2    O STP     1     -30.495  65.669-122.669    0.00     3.08
ATOM      3    O STP     1     -25.792  61.516-124.085    0.00     3.32
ATOM      4    O STP     1     -25.262  61.506-124.230    0.00     3.05
ATOM      5    O STP     1     -30.521  65.070-122.439    0.00     3.05

There is no whitespace between the Y and Z coordinates in this file, so Prody fails to parse it.

Fix

Currently, Prody is parsing PQR files differently than PDB files by splitting each line on whitespaces. (prody/proteins/pdbfile.py)

  if not isPDB:
      fields = line.split()
  ......
  coordinates[acount, 0] = fields[6]
  coordinates[acount, 1] = fields[7]
  coordinates[acount, 2] = fields[8]

To avoid parsing issues Prody should parse PQR files by position/index which conforms to the file specification for a PQR file and matches other PQR parsers such as Biopython.

x = float(line[30:38])
y = float(line[38:46])
z = float(line[46:54])

Caveat

It appears that different softwares write the charge and radius information in different positions. (This may be the reason why Prody chose to parse on whitespace from the beginning.)

12345678901234567890123456789012345678901234567890123456789012345678901234
                Resn  Resi           X       Y       Z       Q       R

From https://pipe.rcc.fsu.edu/transcomp/PQRformat.htm
ATOM      1  N   VAL     1      16.783  48.812  26.447  0.0577   1.550
ATOM      2  H1  VAL     1      15.848  48.422  26.463  0.2272   1.200
ATOM      3  H2  VAL     1      16.734  49.803  26.251  0.2272   1.200
ATOM      4  H3  VAL     1      17.195  48.663  27.359  0.2272   1.200
ATOM      5  CA  VAL     1      17.591  48.101  25.416 -0.0054   1.700

From fpocket 4.0
ATOM      1    C STP     1     -26.417  62.269-123.283    0.00     3.30
ATOM      2    O STP     1     -30.495  65.669-122.669    0.00     3.08
ATOM      3    O STP     1     -25.792  61.516-124.085    0.00     3.32
ATOM      4    O STP     1     -25.262  61.506-124.230    0.00     3.05
ATOM      5    O STP     1     -30.521  65.070-122.439    0.00     3.05

From PyMol 2.5
ATOM     29  P     G    11     -17.189  -6.642 -23.827  0.00000000   0.000
ATOM     30  C5'   G    11     -15.744  -5.986 -25.911  0.00000000   0.000
ATOM     31  O5'   G    11     -16.783  -5.642 -25.005  0.00000000   0.000
ATOM     32  C4'   G    11     -15.722  -5.012 -27.074  0.00000000   0.000
ATOM     33  O4'   G    11     -16.947  -5.133 -27.838  0.00000000   0.000

The text was updated successfully, but these errors were encountered:

jamesmkrieger · 2024-04-09T07:43:52Z

Thank you. So, we should probably just use ranges for coordinates where they might lose the space and keep the field splitting with spaces for the rest.

jamesmkrieger · 2024-04-09T07:48:26Z

However, it seems like this could be a problem depending on the program too:
https://ics.uci.edu/~dock/manuals/apbs/html/user-guide/a2566.html#:~:text=The%20PQR%20format%20is%20the,with%20standard%20molecular%20graphics%20programs says "all fields are whitespace-delimited, thereby allowing coordinates which are larger/smaller than ± 999 Å"

jamesmkrieger · 2024-04-09T07:49:11Z

I suppose we need some kind of check for how many fields we get by splitting with whitespace delimiting and what we get if we use the standard columns

jamesmkrieger · 2024-04-09T08:19:11Z

ok, I think I have a fix. Please try checking out the pqr_fix branch from https://github.com/jamesmkrieger/ProDy.git and testing a few files. I have tried some example lines from the ones you provided and they work.

jamesmkrieger · 2024-04-17T14:03:38Z

I haven't merged this branch as I don't have any unit tests yet, but I guess at some point I can make them from the example snippets you gave here.

VonBoss · 2024-04-17T14:38:28Z

This looks like an appropriate fix to me.

jamesmkrieger · 2024-04-17T14:41:31Z

thank you

VonBoss · 2024-04-21T18:05:36Z

paresPQR() Test

I was able to test your fix this weekend. Your correction can't handle lines starting with HEADER, TER, CONECT, etc. because they don't have the same number of fields.

Input PQR file

HEADER
HEADER File generated using PyMol 2.5.
HETATM    1  N   LYS     0       1.422   1.796   0.198  0.00000000   0.000
HETATM    2  CA  LYS     0       1.394   0.355   0.484  0.00000000   0.000
HETATM    3  C   LYS     0       2.657  -0.284  -0.032  0.00000000   0.000
HETATM    4  O   LYS     0       3.316   0.275  -0.876  0.00000000   0.000
HETATM    5  CB  LYS     0       0.184  -0.278  -0.206  0.00000000   0.000
HETATM    6  CG  LYS     0      -1.102   0.282   0.407  0.00000000   0.000
HETATM    7  CD  LYS     0      -2.313  -0.351  -0.283  0.00000000   0.000
HETATM    8  CE  LYS     0      -3.598   0.208   0.329  0.00000000   0.000
HETATM    9  NZ  LYS     0      -4.761  -0.400  -0.332  0.00000000   0.000
HETATM   10  OXT LYS     0       3.050  -1.476   0.446  0.00000000   0.000
HETATM   11  H   LYS     0       1.489   1.891  -0.804  0.00000000   0.000
HETATM   12  H2  LYS     0       0.521   2.162   0.464  0.00000000   0.000
HETATM   13  HA  LYS     0       1.322   0.200   1.560  0.00000000   0.000
HETATM   14  HB2 LYS     0       0.210  -0.047  -1.270  0.00000000   0.000
HETATM   15  HB3 LYS     0       0.211  -1.359  -0.068  0.00000000   0.000
HETATM   16  HG2 LYS     0      -1.128   0.050   1.471  0.00000000   0.000
HETATM   17  HG3 LYS     0      -1.130   1.363   0.269  0.00000000   0.000
HETATM   18  HD2 LYS     0      -2.287  -0.120  -1.348  0.00000000   0.000
HETATM   19  HD3 LYS     0      -2.285  -1.432  -0.145  0.00000000   0.000
HETATM   20  HE2 LYS     0      -3.625  -0.023   1.394  0.00000000   0.000
HETATM   21  HE3 LYS     0      -3.626   1.289   0.192  0.00000000   0.000
HETATM   22  HZ1 LYS     0      -4.736  -0.185  -1.318  0.00000000   0.000
HETATM   23  HZ2 LYS     0      -4.735  -1.400  -0.205  0.00000000   0.000
HETATM   24  HZ3 LYS     0      -5.609  -0.031   0.071  0.00000000   0.000
HETATM   25  HXT LYS     0       3.861  -1.886   0.115  0.00000000   0.000
END

Error message

    [540]  if not isPDB:
    [541]       fields = line.split()
--> [542]     if fields[5].find('.') != -1:
    [543]           # coords too early as no child
    [544]           fields.insert(4, '')
    [545]       if len(fields) != 11:


IndexError: list index out of range

Fix

I moved your field splitting code so that it is only applied to ATOM and HETATM lines.

while i < stop:
    line = lines[i]
    startswith = line[0:6].strip()

    if startswith == 'ATOM' or startswith == 'HETATM':
        if isPDB:
            atomname = line[12:16].strip()
            if long_resname:
                resname = line[17:21].strip()
            else:
                resname = line[17:20].strip()
                
        else:
            fields = line.split()
            if fields[5].find('.') != -1:
            # coords too early as no chid
                fields.insert(4, '')
            if len(fields) != 11:
                try:
                    fields = fields[:6] + [line[30:38].strip(), line[38:46].strip(), line[46:54].strip()] + line[54:].split()
                except:
                    LOGGER.warn('wrong number of fields for PQR format at line %d'%i)
                    i += 1
                    continue
            
            atomname= fields[2]
            resname = fields[3]

New fix applied to an example with no XYZ whitespace

parsePQR() input file

HEADER
HEADER This is a pqr format file writen by the program fpocket.                 
HEADER It contains all the pocket vertices found by fpocket.                    
ATOM      1    C STP     1     -26.417  62.269-123.283    0.00     3.30
ATOM      2    O STP     1     -30.495  65.669-122.669    0.00     3.08
ATOM      3    O STP     1     -25.792  61.516-124.085    0.00     3.32
ATOM      4    O STP     1     -25.262  61.506-124.230    0.00     3.05
ATOM      5    O STP     1     -30.521  65.070-122.439    0.00     3.05
TER
END

parsePQR() --> writePQR() output file

ATOM       1  C    STP       1      -26.417   62.269 -123.283  0.0000  3.3000
ATOM       2  O    STP       1      -30.495   65.669 -122.669  0.0000  3.0800
ATOM       3  O    STP       1      -25.792   61.516 -124.085  0.0000  3.3200
ATOM       4  O    STP       1      -25.262   61.506 -124.230  0.0000  3.0500
ATOM       5  O    STP       1      -30.521   65.070 -122.439  0.0000  3.0500

Other bugs in pdbfile.py

I used the writePQR() function to do some of my testing, but this function only works if an alpha carbon is present in the atom group. I added a try block in the writePQRstream() function to quickly get around this issue. Requiring an alpha carbon to write a PQR file seems like a bug to me.

Quick fix writePQRstream():

    calphas = atoms.ca
    try:
        ssa = calphas.getSecstrs()
    except:
        ssa = None
    helix = []
    sheet = []

jamesmkrieger · 2024-04-21T18:58:59Z

Thank you very much! I’ll look over these in a month or so when I’m less busy

jamesmkrieger · 2024-04-21T18:59:49Z

(Closed it by accident)

jamesmkrieger · 2024-05-06T07:36:05Z

It looks like the last one should have the calphas line included in the try block

jamesmkrieger · 2024-05-15T17:21:14Z

ok, I've just added these two fixes to my branch and pull request too

@atbogetti, it would still be great if you could check this more thoroughly

jamesmkrieger mentioned this issue Apr 9, 2024

split pqr coords by line positions when needed #1860

Merged

jamesmkrieger closed this as completed Apr 21, 2024

jamesmkrieger reopened this Apr 21, 2024

atbogetti self-assigned this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in PQR parser #1855

Bug in PQR parser #1855

VonBoss commented Mar 29, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 17, 2024 •

edited

Loading

VonBoss commented Apr 17, 2024

jamesmkrieger commented Apr 17, 2024

VonBoss commented Apr 21, 2024

jamesmkrieger commented Apr 21, 2024

jamesmkrieger commented Apr 21, 2024

jamesmkrieger commented May 6, 2024

jamesmkrieger commented May 15, 2024

Bug in PQR parser #1855

Bug in PQR parser #1855

Comments

VonBoss commented Mar 29, 2024

Summary

Example: lines from a file that failed to parse

Fix

Caveat

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 9, 2024

jamesmkrieger commented Apr 17, 2024 • edited Loading

VonBoss commented Apr 17, 2024

jamesmkrieger commented Apr 17, 2024

VonBoss commented Apr 21, 2024

paresPQR() Test

Input PQR file

Error message

Fix

New fix applied to an example with no XYZ whitespace

parsePQR() input file

parsePQR() --> writePQR() output file

Other bugs in pdbfile.py

Quick fix writePQRstream():

jamesmkrieger commented Apr 21, 2024

jamesmkrieger commented Apr 21, 2024

jamesmkrieger commented May 6, 2024

jamesmkrieger commented May 15, 2024

jamesmkrieger commented Apr 17, 2024 •

edited

Loading