Large scale de novo protein design

Protein design efforts have typically focused on obtaining a few successful designs, either as proof that a method is feasible or to use for some application. This quest for success led to the development of powerful software and enabled spectacular feats of molecular engineering. Still, many of these achievements are so singularly impressive, even years later, because they required substantial human effort and skill: to craft a design process capable of producing some successes, to hand-craft specific designs in some cases, and to separate the successes from the many failures by time-consuming experimental characterization. These challenges stem from the things we have yet to learn about protein biophysics, which makes our models unreliable and limits the widespread growth of protein design.

My work is helping to transform protein design into a tool for large-scale analysis of protein biophysics. Each designed protein is a new experiment revealing how properties like folding stability and binding affinity result from protein structure. With the power of de novo protein design, we can locate these experiments in the most informative regions of sequence and structure space, without being limited by the protein structures that evolved in our history. Despite this potential, and even though protein design has long benefited from computational modeling, protein design experiments have rarely been used to improve computational modeling. This is because testing designed proteins is typically slow and laborious, and limited by the cost of DNA for each design.

My research takes advantage of advances in DNA synthesis, DNA sequencing, and high-throughput approaches such as flow cytometry and mass spectrometry to experimentally analyze thousands of unique designed sequences in parallel. I recently demonstrated the power of this approach by assaying 15,000 de novo designed minimal proteins for folding stability, revealing the determinants of stability for these proteins and also highlighting key areas where the computational design model needed improvement. These data increased the success rate of the designs from 6% to 47% across four generations of design and testing. This new scale of protein design introduces a completely new route to improving protein modeling - by large-scale analysis of both the proteins of the past, and the proteins of the future.

This work was recently published:

De novo design of miniature proteins as novel inhibitors

De novo protein design is the process of designing a folded protein structure and sequence without using an existing, experimentally determined protein conformation as the starting point. The ability to carefully sculpt a de novo protein's conformation makes this a powerful method for difficult protein design challenges, and the exceptional thermostability of de novo proteins makes them potentially advantageous for use as protein therapeutics and diagnostics.

I am applying de novo approaches to design protein inhibitors of protein-protein interactions relevant to cancer, asthma, and malaria. To minimize cost and potential immunogenicity, I am focusing on very small designs, such as the designed AMA1-RON2 interaction inhibitor pictured above. This interaction is critical to the invasion process in malaria progression, and is heavily studied by our collaborators in Prakash Srinivasan's lab at Johns Hopkins.

This work is ongoing, and my poster on malaria shows the design process and early results.

Binding Free Energy Calculations for Charged Compounds

Free energy calculations based on molecular dynamics simulations have the potential to accurately predict protein-ligand binding. However, these calculations are very computationally expensive, making it necessary to employ highly simplified systems for development and testing. We tested whether binding free energy calculations could predict affinities for charged compounds using the simplified cytochrome c peroxidase system (shown at right), an engineered protein cavity that can bind many simple organic cations. The test was performed blind, without knowledge of the experimental affinities or poses of the compounds, including whether they would bind at all. The results using a standard force field (previously shown to be reasonably accurate for simpler, uncharged compounds) predicted all compounds (charged and neutral) to bind too strongly, with systematically larger errors for the more polar compounds. A charge scaling scheme was able to correct some of these errors prospectively.

This work was published in:

In the course of planning these calculations, we also developed a new thermodynamic cycle for relative binding free energy calculations, called the Separated Topologies method. This method combines the versatility of absolute binding free energy calculations (allowing comparisons between arbitrary, unrelated ligands) and the efficiency of relative binding free energy calculations.

The method was published in:

Sensitivity of binding free energy calculations to force field parameters

Binding free energy calculations depend on empirical force fields with hundreds of parameters, and it is expected that these parameters must be highly accurate for the resulting affinities to agree with experiment. We investigated the sensitivity of binding free energy calculations to the non-bonded energy parameters in force fields - atomic radii, dispersion well-depths, and partial charges. This provided practical insight into how accurate force field parameters need to be to obtain accurate binding affinities.

The sensitivity to charge parameters is especially interesting because these electrostatic interactions are modulated by the dielectric environment of the binding site. Examining the sensitivity of binding affinities to charge parameters reveals the amount of dielectric screening in explicit solvent simulations. By comparing explicit solvent free energy calculation results to continuum dielectric results based on the Poisson-Boltzmann model, the effective protein dielectric constant in the explicit solvent calculations can be determined, which appears to vary significantly across the spatial extent of a ligand (shown above).

This work was published in:

Finite-size Artifacts

The molecular dynamics simulations that power free energy calculations are commonly conducted using periodic boundary conditions. When binding free energies are calculated for charged ligands, the periodic boundary conditions introduce artifactual energy contributions which often depend on the size of the periodic unit cell employed in the calculation. In other words, the binding free energy calculated from the simulations will depend on the arbitrary choice of the cell size, making it impossible to compare simulations with experimental measurements.

These artifacts are well understood for the case of calculating the solvation free energy of an ion (1, 2), and analytical correction terms have been developed to remove the finite size artifacts that affect these calculations. However, it has been difficult to develop analogous correction terms for protein-ligand binding. Recently we investigated these artifacts empirically in the same cytochrome c peroxidase system studied above, and introduced a new and accurate method for removing these artifacts, illustrated above. The approach introduces the concept of residual integrated potential (RIP), which is essential for correcting binding affinities of charged compounds.

This work was published in: