# Architecture and self-assembly of the jumbo bacteriophage nuclear shell – Nature

### Bacterial and phage growth conditions

P. chlororaphis 200-B21 was grown on hard agar (HA) medium (10 g l−1 Bacto-Tryptone, 5 g l−1 NaCl and 10 g l−1 agar in distilled water) at 30 °C. 201phi2-1 lysates were collected as previously described with minor modifications2. In brief, 0.5 ml from a dense P. chlororaphis culture grown in HA liquid media (HA medium with no agaradded) was infected with 10 μl serial dilutions of high-titre 201phi2-1 lysate (1011–1012 pfu ml−1), incubated for 15 min at room temperature, mixed with 4.5 ml of HA 0.35% top agar, poured over HA plates, and incubated overnight at 30 °C. Then, 5 ml phage buffer was added to web lysis plates the following day and incubated at room temperature for 5 h. The phage lysate was collected by aspiration, cell debris was pelleted by centrifugation at 3,220g for 10 min, and the resulting clarified phage lysate was stored at 4 °C. E. coli strain APEC 2248 was obtained from DSMZ and grown in LB medium at 37 °C. Goslar lysate (1010 pfu ml−1) was provided by J. Wittmann and stored at 4 °C. All bacterial strains used in this study are listed in Supplementary Table 10.

### Plasmid construction and expression

The pHERD-30T plasmid was used for expressing eGFP-tagged full-length and truncated Chimallin in P. chlororaphis22,23. All constructs were designed with eGFP fused to the N terminus and synthesized by GenScript. The plasmids were electroporated into P. chlororaphis, and 25 μg ml−1 gentamicin sulfate was used for selection of colonies. All plasmids used in this study are listed in Supplementary Table 10.

### Live-cell fluorescence microscopy and image analysis

P. chlororaphis cells (5 μl of culture at A600 = 0.6) were inoculated on imaging pads in welled microscope slides. The imaging pads were composed of 1% agarose, 25% HA broth, 25 μg ml−1 gentamicin sulfate, 2 μg ml−1 FM4-64, 0.2 μg ml−1 DAPI and 1% arabinose to induce expression. The slides were incubated at 30 °C for 3 h in a humid chamber, and 10 μl of undiluted high-titre 201phi2-1 lysate was added to the pads to infect the cells 60 min before imaging. Samples were imaged using the DeltaVision Elite deconvolution microscope (Applied Precision). Images were deconvolved using the aggressive algorithm in the DeltaVision softWoRx program-v6.5.2.

All image analysis was performed on images prior to deconvolution. Average protein incorporation into the 201phi2-1 phage nucleus structure was determined using FIJI by measuring the mean grey value of the ring of GFP intensity that denotes the phage nucleus and the mean grey value of cytoplasmic GFP outside of this ring. The ratio of mean grey values of this ring GFP to cytoplasmic GFP was calculated as the average incorporation. Representative cells were chosen from each dataset, and a 3D graph of normalized GFP intensity was generated in MATLAB 2019a. Statistical analyses were performed using Prism-v9.3 (GraphPad Software).

### Growth curves

Culture of P. chlororaphis transformed with either an empty vector or one of the chimallin constructs were grown in HA broth containing 25 μg ml−1 gentamicin sulfate to an A600 between 0.6 and 0.8 and then back-diluted to an OD600 of 0.1 in medium containing 1% arabinose in 96-well plates. Serial tenfold dilutions of 201phi2-1 lysate were added and growth monitored by A600 at 10-min intervals for 8 h with continuous shaking at 30 °C. All growth curves were performed in duplicate, with duplicate wells (a total of four wells averaged per data point).

### Cryo-EM of in situ samples and image acquisition

For grid preparation of 201phi2-1 infections, host bacterial cells were infected on agarose pads as previously described2 for 50–60 min. For grid preparation of Goslar infections, 10 agarose pads (1% agarose, 25% LB) were prepared in welled slides and spotted with 10 µl E. coli (APEC 2248) cells at an A600 of ~0.35 then incubated at 37 °C for 1.5 h in a humidor. Ten microlitres of Goslar lysate from the DSMZ was added to each pad. At approximately 30 mpi, a portion of the infected cells were collected at room temperature and delivered for plunge-freezing. Infected cells were collected by the addition of 25 µl of 25% LB to each pad and gentle scraping with the bottom of an eppendorf tube followed by aspiration. A portion of the collection was aliquoted, the remainder was centrifuged at 6,000g for 45 s, resuspended with 0.25× volume of the supernatant, and a portion of that was diluted 1:1 in supernatant. The remaining cells incubated on pads at 37 °C until 90 mpi, at which point they were assessed for productive infections by light-microscopy. Plunging of samples began 20–30 min after removal from 37 °C, which significantly slows infection progression. Since phage nuclei were observed in this sample after cryoFIB-ET, the sample was suitable for the analyses performed in this study.

A volume of 4–7 µl of cells were deposited on R2/1 Cu 200 grids (Quantifoil) that had been glow-discharged for 1 min at 0.19 mbar and 20 mA in a PELCO easiGlow device shortly before use. Grids were mounted in a custom-built manual plunging device (Max Planck Institute of Biochemistry) and excess liquid blotted with filter paper (Whatman no. 1) from the backside of the grid for 5–7 s prior to freezing in a 50:50 ethane:propane mixture (Airgas) cooled by liquid nitrogen.

Grids were mounted into modified Autogrids (Thermo Fisher Scientific) compatible with cryo-focus ion-beam milling. Samples were loaded into an Aquilos 2 cryo-focused ion-beam/scanning electron microscope (TFS) and milled to generate lamellae approximately ~150–250 nm thick as previously described24.

Lamellae were imaged using a Titan Krios G3 transmission electron microscope (Thermo Fisher Scientific) operated at 300 kV configured for fringe-free illumination and equipped with a K2-directed electron detector (Gatan) mounted post Quantum 968 LS imaging filter (Gatan). The microscope was operated in EFTEM mode with a slit-width of 20 eV and using a 70 µm objective aperture. Automated data acquisition was performed using SerialEM-v3.8b1125 and all images were collected using the K2 in counting mode.

For lamellae of 201phi2-1-infected P. chlororaphis, tilt series were acquired at a 3.46 Å pixel size over a nominal range of ±51° in 3° steps with a grouping 2 using a dose-symmetric scheme26 with a per-tilt fluence of 1.8 e Å−2 and total of about 120 e Å−2 per tilt series. Nine tilt series were acquired with a realized defocus range of −4.5 to −6 µm along the tilt axis. An additional two datasets of six tilt series each were collected at a pixel size of 4.27 Å with nominal tilt ranges of ±50° and ±60° in 2° steps with a grouping 2 using a dose-symmetric scheme with a per-tilt fluence of 1.8–2.0 e Å−2 and total of about 100–110 e Å−2 per tilt series.

For lamellae of Goslar-infected APEC 2248, tilt series were acquired at a 4.27 Å pixel size over a nominal range of ±56° in 2° steps with a grouping 2 using a dose-symmetric scheme with a per-tilt fluence of 2.6 e Å−2 and total of about 150 e Å−2 per tilt series. Twenty-one tilt series were acquired with a realized defocus range of −5 to −6 µm along the tilt axis.

### Image processing and subtomogram analysis of in situ cryo-EM data of the phage nucleus

All tilt series pre-processing was performed using Warp-v1.09 unless otherwise specified27. Tilt movies were corrected for whole-frame motion and aligned via patch tracking using Etomo (IMOD-v4.10.28)28. Tomograms were reconstructed with the deconvolution filter for visualization and manual picking in 3dmod (IMOD-v4.10.28). All subsequently reported resolution estimates are based on the 0.143-cutoff criterion of the Fourier shell correlations between masked, independently refined half-maps using high-resolution noise substitution to mitigate masking artefacts29.

First, for the 201phi2-1-infected P. chlororaphis dataset collected at 3.46 Å per pixel, subtomogram averaging of the P. chlororaphis host cell ribosomes was performed in order to improve initial tilt series alignments using the recently developed multi-particle framework, M30 (Extended Data Fig. 1n). A set of 400 particles were manually picked across the tomograms, extracted at 20 Å per pixel, and aligned in RELION-v3.1.1 to generate an initial reference31,32. This data-derived reference was used for template-matching against 20 Å per pixel tomograms at a sampling rate of 15°. Template-matched hits were curated in Cube (https://github.com/dtegunov/cube), ultimately resulting in 17,169 particle picks. The initial particle set was extracted at 10 Å per pixel and subjected to Class3D, after which 11,148 particles were selected for further analysis. Refine3D of the curated particle set reached the binned Nyquist limit of 20 Å. The refined particles were imported into M-v1.09 at 3.46 Å per pixel. Three iterations of refinement were performed starting with image-warp and particle poses, then incorporating refinement of stage angles and volume-warp, and finally including individual tilt-movie alignment. This procedure resulted in a ribosome reconstruction at an estimated resolution of about 11 Å. Further refinement of the particles in RELION yielded a reconstruction at an estimated resolution of about 10 Å. Neither additional attempts at 3D classification nor multi-particle refinement leads to an improved ribosome reconstruction.

New tomograms were reconstructed at 20 Å per pixel using the ribosome alignment metadata. The perimeters of the 201phi2-1 phage nuclei were coarsely traced in these updated tomograms using 3dmod32. Traces were converted into surface models using custom MATLAB (MathWorks, v2019a) scripts and built-in Dynamo-v1.1.514 functions33. Points were sampled every 4 nm along the surface models, oriented normal to the surface (that is, positive Z towards the cytosol), and extracted from normalized, contrast transfer function (CTF)-corrected tomograms at 10 Å per pixel in a 480 Å side-length box using Dynamo33. Initial orientations of the 66,887 extracted particles were curated using the Place Object plugin34 for UCSF Chimera-v1.1535 and incorrectly oriented particles were manually flipped.

To generate an initial reference, a subset of 17,622 particles from 2 tomograms were subjected to reference-free alignment in Dynamo-v1.1.514 for several iterations. For this procedure, no point-group symmetry was enforced, alignment was limited to 40 Å by an ad hoc lowpass filter each interaction, and the out-of-plane searches were restricted to prevent flipping of sidedness. The alignment converged to yield a reconstruction conforming to an apparent square (p4, 442) lattice. Analysis of particle positions and orientations using Place Objects34 and neighbour plots36 were consistent with reconstructed average and indicated a spacing between approximately 11.5 nm between congruent four-fold axes.

The initial reference was subsequently used to align the entire dataset for a single iteration in Dynamo. For this step, alignment was limited to 40 Å, the out-of-plane searches were restricted to prevent flipping of sidedness, C4 symmetry enforced, and a box-wide by 240 Å cylindrical alignment mask applied. Inspection of particle positions and orientations using Place Objects34 and neighbour plots36 were again consistent with a square-lattice-like arrangement. To deal with the initial over-sampling, particle duplicates were identified as those within 9 nm centre-to-centre distance of another particle and the one with the lower cross-correlation to the reconstruction removed, which resulted in 21,165 retained particles. In addition, a geometry-based cleaning step was performed to remove particles with less than three neighbours within 10 to 13 nm, which resulted in 8,454 retained particles.

The curated particle set was split into approximately equal half-sets on a per-tomogram basis, converted to the STAR file format using the dynamo2m-v0.2.2 package37, and re-extracted into a 480 Å-side-length box at 5 Å per pixel in Warp for use in RELION. A round of Refine3D was performed using a 40 Å lowpass filtered reference, C4 symmetry, local-searches starting at 3.7°, and a box-wide soft shape mask. This resulted in a reconstruction at an estimated resolution of 24 Å for the 8,454-particle set.

Classification without alignment was performed using a 320 Å spherical mask and C4 symmetry, to promote convergence from the relatively low particle count. This differentiated three distinct classes corresponding to concave (2,033 particles), flat (4,475 particles) and convex (945 particles) states of the central tetramer, along with a noisy class (1,001 particles). The particles corresponding to the interpretable classes were re-extracted into a 320 Å box at 5 Å per pixel and subjected to 3D auto-refinement as described for the consensus reconstruction. The estimated resolutions for the lowered, intermediate, and raised classes were 20 Å, 18 Å and 23 Å, respectively. Refinement in M of either the consensus particle set or the three aforementioned classes above did not yield notable improvements in the reconstructions. This may be attributed to prior refinement of the tilt series alignment, the most resolution-limiting factor, using the host ribosomes30.

The 12 tilt series of 201phi2-1-infected P. chlororaphis dataset collected at 4.27 Å per pixel were pre-processed and host ribosomes averaged as described above. The ribosome reconstruction from above was lowpass filtered to 40 Å and used for template-matching in Warp-v1.09, and curated in Cube to yield 47,469 particle positions. Ribosomes were extracted at 10 Å per pixel and subjected to reference-free 3D classification in RELION-v3.1.1 which resulted in 15,782 subtomograms. Masked auto-refinement resulted in a reconstruction with an estimated resolution of 28 Å. Refinement of tilt series parameters in M improved the resolution to about 20 Å (not shown).

For the 201phi2-1 nucleus in the 4.27 Å per pixel dataset, we were unable to completely resolve the quasi-lattice register using either an ab initio reference or the reconstruction from above as determined by neighbour plots. Thus, this dataset was solely included in the analysis of the unidentified spherical bodies and not further subtomogram averaging.

The tilt series of Goslar-infected APEC 2248 were pre-processed and host ribosomes averaged similarly as described above. An initial host ribosome reference was generated from 400 randomly selected particles and used for template-matching in Warp-v1.09, and curated in Cube to yield 98,981 particle positions. Ribosomes were extracted at 10 Å per pixel and subjected to reference-free 3D classification in RELION-v3.1.1 which distinguished 70S and 50S classes containing 46,056 and 3,710 particles, respectively. Particles were re-extracted at 6 Å per pixel and subjected to 3D auto-refinement to yield 20 Å and 14 Å for the 50S and 70S classes, respectively. Refinement of tilt series parameters in M, followed by an additional round of 3D auto-refinement at 4.27 Å per pixel resulted in 12 Å and 8.54 Å (Nyquist limit of the data) for the 50S and 70S, respectively.

For the Goslar nucleus, the nuclei perimeters were traced from 6 tomograms and used to extract over-sampled points normal to the surface, which resulted in 42,416 initial particles. A subset of 10,512 particles were used to generate an ab initio reference in C1 as described above. The initial Goslar reference presented a similar spacing (~11.5 nm) and apparent C4 symmetry as the 201phi2-1 reconstruction, however the average converged on the opposite four-fold axis. For ease of subsequent analysis, the centre of the Goslar initial reference was shifted to match 201phi2-1. Alignment of the entire dataset was performed as described above and distance-based cleaning post-alignment resulted in 4,501 particles for further processing. Alignment and reconstruction in RELION enforcing C4 symmetry resulted in a reconstruction with an estimated resolution of 27 Å. Similar to the 201phi2-1, 3D classification of the consensus refinement without alignment separated the data into classes in which the central tetramer appeared concave (2,802) or convex (1,699). Refinement of these classes resulted in reconstruction with estimated resolutions of 20 Å and 25 Å for the lowered and raised classes, respectively.

### Analysis of unidentified spherical bodies

USBs were manually identified and their maximal apparent diameters measured from their line intensity profiles in 20 Å per pixel tomograms using FIJI38. In order to assess whether the surfaces of these compartments possessed an underlying structure, we attempted subtomogram analysis of the compartment surfaces essentially as described in37,39. We were unable to obtain a reconstruction exhibiting a regular underlying structure as assessed both visually and by neighbour plots. However, despite their differing exterior membrane, the interior density of the USBs is visually consistent with nucleic acid like that of the interior of the phage nucleus.

### Segmentation and visualization of in situ tomography data

Segmentation of host cell membranes and the phage nucleus perimeter was performed on 20 Å per pixel tomograms by first coarsely segmenting using TomoSegMemTV40 followed by manual patching with Amira-v6.7 (TFS). For the purposes of segmentation, phage capsids, tails, PhuZ, and RecA-like particles were subjected to a coarse subtomogram averaging procedure using particles sampled at 20 Å per pixel. For capsids, all particles were manually picked. Reference-generation and alignment of capsids was performed by enforcing icosahedral symmetry with Relion-v3.1.131,32 (despite the capsids possessing C5 symmetry) in order to promote convergence from the low number of particles. For the phage tails, the start and end points along the filament axis were defined manually and used to seed over-sampled filament models in Dynamo-v1.1.51433,41. An initial reference for the tail was generated using Dynamo-v1.1.514 from two full-length tails with clear polarity. The resulting reference displayed apparent C6 symmetry, which was enforced for the alignment of all tails from a given tomogram using Dynamo-v1.1.514 and Relion-v3.1.1. Similar to the phage tails, the PhuZ and RecA-like filaments were picked and refined but without enforcing symmetry. We do not report resolution claims for these averages and use them solely for display purposes in segmentations. Duplicate particles were removed and final averages were placed back in the reference-frame of their respective tomograms using dynamo_table_place. For clarity, a random subset of 500 ribosomes were selected for display in the segmentation.

### Surface curvature estimates from segmentation

The segmentation of the phage nucleus, depicted in Fig. 1d, sampled at 2 nm per pixels, was used to estimate the principal curvature of the shell using PyCurv42. PyCurv was run with default parameters and a hit radius of three pixels. For visualization purposes, the principal curvature values (κ1, κ2) were converted from a radius of curvature (r (nm−1)) to an angle (𝜃 (degrees)) using the following formula:

$$\theta =2{\rm{\arctan }}\left(\frac{s}{2r}\right)$$

where r is the radius of curvature (inverse of κn) and s is the side length of the polygon circumscribed, taken as 5.75 nm.

### Protein expression and purification

Full-length Chimallin from bacteriophages 201phi2-1 (gp105; NCBI Accession YP_001956829.1) and Goslar (gp189; NCBI Accession YP_009820873.1) were cloned with an N-terminal TEV protease-cleavable His6 tag using UC Berkeley Macrolab vector 2-BT (Addgene #29666). Truncations and other modified constructs were cloned by PCR mutagenesis and isothermal assembly, and inserted into the same vector. Proteins were expressed in E. coli Rosetta2 pLysS (Novagen) by growing cells to A600 = 0.8, inducing expression with 0.3 mM IPTG, then growing cells at 20 °C for 16–18 h. Cells were harvested by centrifugation and resuspended in buffer A (50 mM Tris pH 7.5, 10 mM imidazole, 300 mM NaCl, 10% glycerol, and 2 mM β-mercaptoethanol), then lysed by sonication and the lysate cleared by centrifugation. Protein was purified by Ni2+ affinity method. The purified proteins were centrifuged briefly to settle down the floating particles (visible large assemblies and aggregated proteins). The proteins were dialysed into buffer B (20 mM Tris pH 7.5, 250 mM NaCl, 2 mM β-mercaptoethanol) and the N-terminal histidine tag was cleaved using TEV protease with overnight incubation at 4 °C. The retrieved tagless proteins were further purified for homogeneity through Superose 6 Increase 10/300 GL column (Cytiva) in buffer B. The quality of purified proteins was verified by SDS–PAGE analysis.

For analysis by SEC–MALS, a 100 μl sample of protein at 2 mg ml−1 was passed over a Superose 6 Increase 10/300 GL column (Cytiva) in buffer B. Light scattering and refractive index profiles were collected by miniDAWN TREOS and Optilab T-rEX detectors (Wyatt Technology), respectively, and molecular weight was calculated using ASTRA v. 8 software (Wyatt Technology).

### Cryo-EM of in vitro samples and image acquisition

For grid preparation, freshly purified recombinant 201phi2-1 chimallin was collected from size-exclusion chromatography (estimated concentration of 4 µM of the monomer, 0.3 mg ml−1). Immediately prior to use, R2/2 Cu 300 grids (Quantifoil) were glow-discharged for 1 min at 0.19 mbar and 20 mA in a PELCO easiGlow device. Sample was applied to a grid as a 3.2 µl drop in the environmental chamber of a Vitrobot Mark IV (Thermo Fisher Scientific) held at 16 °C and 100% humidity. Upon application of the sample, the grid was blotted immediately with filter paper for 3 s prior to plunging into a 50:50 ethane:propane mixture cooled by liquid nitrogen. Grids were mounted into standard AutoGrids (Thermo Fisher Scientific) for imaging. Grids for recombinant Goslar chimallin protein were prepared similarly, but with the modification that the sample was concentrated to approximately 33 µM of the monomer (2.5 mg ml−1) prior to plunge-freezing. The void peaks from each purification were frozen similarly at the eluted concentration after dilution 1:1 with 6 nm BSA-tracer gold (Electron Microscopy Sciences).

All samples were imaged using a Titan Krios G3 transmission electron microscope (Thermo Fisher Scientific) operated at 300 kV configured for fringe-free illumination and equipped with a K2-directed electron detector (Gatan) mounted post Quantum 968 LS imaging filter (Gatan). The microscope was operated in EFTEM mode with a slit-width of 20 eV and using a 70 µm objective aperture. Automated data acquisition was performed using SerialEM-v3.8b1125 and all images were collected using the K2 in counting mode.

For the 201phi2-1 24mer sample, tilt series were acquired using a pixel size of 1.376 Å with a per-tilt fluence of 4.7 e Å−2 using a dose-symmetric scheme25 from ±51° in 3° steps and a grouping 3, resulting in a fluence of 164.5 e Å−2 per tilt series. In total 4 tilt series were collected with a realized defocus of −2.5 to −4 µm along the tilt axis. Movies for SPA were recorded at a pixel size of 1.075 Å with fluence of 42.6 e Å−2 distributed uniformly over 40 frames. Automated data acquisition was performed using image shift with active beam tilt compensation to acquire nine movies per hole per stage movement. In total 4,192 movies were acquired with a realized defocus range of −0.1 to −1.5 µm.

For the Goslar 24mer sample, movies for SPA were recorded at a pixel size of 0.8452 Å with fluence of 40 e Å−2 distributed uniformly over 44 frames. Automated data acquisition was again performed using image shift with active beam tilt compensation to acquire 10 movies per hole per stage movement. In total, 3921 movies were acquired with a realized defocus range of −0.1 to −1.5 µm.

For void peak samples, tilt series were acquired similarly to that of the 201phi2-1 24mer sample but using a pixel size of 1.752 Å and tilt-range of ±60°.

### Image processing of in vitro cryo-EM data

All movie pre-processing was performed using Warp-v1.09 unless otherwise specified27. Tilt-movies of the 201phi2-1 chimallin were corrected for whole-frame motion and aligned via patch tracking using Etomo (IMOD-v4.10.28)28. Tomograms were reconstructed with the deconvolution filter for visualization and manual picking of subtomograms using 3dmod (IMOD-v4.10.28)43. A total of 203 manually picked subtomograms and their corresponding 3D CTF volumes were reconstructed with a 288 Å side length. Subtomograms were aligned and averaged initially in C1 by reference-free refinement as implemented in RELION-v3.1.131,32 to an estimated resolution of 22 Å. The C1 reconstruction displayed features consistent with a cubic assembly of the 201phi2-1 chimallin protomers. Thus, an additional round of refinement using the C1 reconstruction as a reference and enforcing O point-group symmetry improved the estimated resolution to 18 Å.

For the single-particle 201phi2-1 chimallin data, movies were motion-corrected with exposure-weighting and initial CTF parameters estimated using 5 × 5 grids. Micrographs were culled by thresholding for an estimated defocus in the range of 0.3–1.5 µm and CTF-fit resolutions better than 6 Å resulting in 4,098 micrographs for further processing. An initial set of 140,782 particle positions were picked with BoxNet2 (Warp-v1.09)27 using a model re-trained on 20 manually curated micrographs and using a threshold of 0.95. Particle images were extracted using a 396 Å side length. All further processing was performed using RELION-v3.1.132 unless otherwise specified. A single round of reference-free 2D-classification was performed and the 128,798 particle images assigned to the averages displaying internal features were selected for further processing. At this stage, analysis of the 2D averages suggested the presence of four-fold, three-fold and two-fold symmetry axes, consistent with a cubic arrangement of the chimallin protomers in the particles. Thus, we subjected the particle images to 3D refinement using the subtomogram average obtained from above as an initial reference lowpass filtered to 35 Å and O point-group symmetry enforced, which resulted in a reconstruction at an estimated resolution of 4.2 Å. However, the reconstruction did not display features consistent with this resolution estimate (for example, β-strands were not separated). The high apparent point-group symmetry and distribution of 2D class averages did not support the inflated resolution being due to a preferred orientation. Partitioning particles into half-sets by micrograph did not change the estimated resolution of reconstruction, indicating the inflated estimate was not due to splitting identical or adjacent particles across the half-sets. In addition, extensive 3D classification with and without symmetry enforced did not yield distinct classes. Therefore, the possibility of quasi-symmetry was investigated by performing localized reconstructions of sub-structures within the particles. To reduce computational burden, the apparent O symmetry was first partially expanded to C4 using relion_particle_symmetry_expand to fully expand to C1 before removing redundant image replicates (noting that redundant views of the four-fold axes possess the same last two Euler angles) to yield 772,788 sub-particles. Refinement of the partially expanded particles while enforcing C4 point-group symmetry and using a soft shape mask resulted in a reconstruction with an estimated resolution of 3.6 Å with notably improved features. Re-centreing and re-extraction using a 245 Å side length followed by refinement improved the estimated resolution to 3.6 Å. CTF refinement44 (per particle defocus, per micrograph astigmatism, beam tilt, and trefoil) and Bayesian polishing45 successively improved the resolution further to 3.5 Å and 3.4 Å, respectively. A round of 2D-classification without alignment was performed to remove particles assigned to empty or poorly resolved classes, which yielded a set of 664,363 sub-particles and no change in the estimated resolution upon re-running 3D refinement. Although the reconstruction substantially improved through this procedure, the C4 map still exhibited distorted density (for example, elongated helices). Attempts at 3D classification did not separate distinct classes. Thus, a localized reconstruction was performed focused on the individual chimallin protomer in C1. Again, to reduce computational burden, before expanding the symmetry to C1 another round of Bayesian polishing was performed in which the sub-particle images were extracted using a 354 Å side length and premultiplied by their CTF before cropping in real space to a 200 Å side length. After another round of 3D refinement enforcing C4 point-group symmetry, the data was expanded to C1 which resulted in 2,657,452 sub-particles and refined to an estimated resolution of 3.3 Å. The Bayesian polishing job was re-run to extract sub-particles at the full box size and without premultiplication by their CTF for import into cryoSPARC-v3.246. A single round of local non-uniform refinement47 was performed in C1 using a user-supplied static mask, marginalization, and FSC noise substitution options, which lead to a final reconstruction of the 201phi2-1 chimallin monomer at an estimated resolution of 3.1 Å.

The Goslar chimallin single-particle data were pre-processed similarly to the 201phi2-1 chimallin data, which after initial thresholding resulted in 2,889 micrographs for further processing. Initial particle positions were identified using the 201phi2-1 chimallin-trained BoxNet2 (Warp-v1.09)27 model with a threshold of 0.1, which resulted in 289,387 picks. Particles were extracted using a 400 Å side length and subjected to iterative rounds of 2D-classification and sub-selection, which resulted in 78,532 particles used for initial 3D refinement. The Goslar chimallin particles exhibited the same quasi-symmetry as the 201phi2-1 chimallin described above, thus were processed using the same localized reconstruction procedure. The quasi-O, quasi-C4 and C1 reconstructions yielded estimated resolutions of 4.0 Å, 2.6 Å and 2.4 Å, respectively. The quasi-C4 and C1 reconstructions within RELION32 were performed on particle images that were extracted using a 470 Å side length, premultiplied by their CTF, and cropped in real space to a 200 Å side length. The final C1 reconstruction was performed in cryoSPARC-v3.246 as described above, which led to a final reconstruction of the Goslar chimallin monomer at an estimated resolution of 2.3 Å from 1,407,340 sub-particle images.

All resolution estimates are based on the 0.143-cutoff criterion of the Fourier shell correlations between masked independently refined half-maps using high-resolution noise substitution to mitigate masking artefacts29. Local resolution estimates were computed using RELION with default parameters. Resolution anisotropy for the C1 reconstructions were assessed using the 3DFSC48 web server which reported sphericity values of 0.963 and 0.994 for the 201phi2-1 and Goslar maps, respectively.

Void peak tilt series were processed similarly to the 201phi2-1 24mer tilt series, but using the gold-fiducials for alignment instead of patch tracking in Etomo28.

### Coordinate model building and refinement

Initial monomer models were generated via the DeepTracer web server49 followed by manual building in COOT-v0.9.150 and subjected to real-space refinement in PHENIX-v1.19.251. To generate tetramer models, monomer models were rigid-body docked into the C4 maps using UCSF Chimera-v1.1535 and the N-terminal segments joined to the appropriate protomer cores. To generate 24mer models, tetramer models were rigid-body docked into the hexahedral maps and the C-terminal segments reassigned to the appropriate protomer cores. To ensure robust refinement, tetramer and 24mer structures were refined with C4 or O non-crystallographic symmetry constraints and reference-model restraints based on high-resolution monomer structures. Isotropic atomic displacement parameters were refined against the respective unsharpened maps. All models were validated using MolProbity52 and EMRinger53 (SI Table 2). EMRinger scores for the 201phi2-1 24mer, tetramer, and monomer models were 0.46, 2.86, and 2.39, respectively. EMRinger scores for the Goslar 24mer, tetramer, and monomer models were 0.92, 3.10, and 3.59, respectively.

### Interface analysis

Interface analysis for the cubic assemblies to identify interacting residues and to calculate buried surface area was performed using the ePISA-v1.5254 and CaPTURE55 web servers.

### Nine-tetramer sheet modelling

Nine chimallin tetramers were arranged in a flat sheet (3 × 3) structure by fitting in the consensus subtomogram average. Assuming the unfolding of the cubical assembly to create a flat sheet structure, the interacting C-terminal segments in the corner three-fold axis were reassigned in a four-fold symmetry axis to the corresponding protomer. The missing residues between the C-terminal domain and C-terminal segment were built manually in COOT-v0.9.1 ensuring no clash with other modelled atoms (taking the flat sheet model in consideration)50. The missing loop region in a protomer (residues 307–319) was built using the DaReUS-Loop web server56. This modelled chain (residues 45–612) was used to re-create the flat sheet structure by applying symmetry. Finally in this flat sheet model, the protruding C-terminal segments of peripheral protomers were trimmed and twelve interacting segments in the periphery were included in the final model (48 chains).

### Protonation state assignment and electrostatics estimates

The electrostatic surface representation was generated with the APBS-v3.0.057 using the AMBER99 force field58 and a pH of 7.5 for assigning protonation states using PROPKA-v3.4.059 through PDB2PQR-v3.4.060,61.

### Elastic network models

Elastic network models61,62,63 are a subset of normal mode analysis64,65. Here we used anisotropic network models (ANM)66 and Gaussian network models (GNM)67,68,69. Both of these models simplify the protein structure into a series of nodes, with an internode potential energy function governing node motion. To look at it another way, each mode is an eigenvector whose corresponding eigenvalue is the frequency of that motion in the model; lower frequencies correspond to dynamics that best describe the structure’s intrinsic motions. ProDy (version 1.0) is a software program enabling calculation of ANM and GNM modes70,71, which we used in this study. We created 20,412 nodes for the ANM and GNM calculations, which is the largest number of nodes ever used in ProDy.

The five lowest frequency GNM modes accounted for 76% of the overall variance. Considering we do not need to use all ENM modes to capture the system’s dynamics72 we selected these five GNM modes and the five lowest frequency ANM modes to use in our models. The GNM’s Kirchoff matrix was built with a pairwise interaction cutoff distance of 10 Å and a spring constant of 1.0, while the ANM’s Hessian matrix used a pairwise interaction cutoff distance of 15 Å and a spring constant of 1.0. The ANM structural ensemble movies used an r.m.s.d. difference of 25 Å from the original conformation to display the protein sheet’s flexibility.

### Molecular dynamics simulations

Simulations were performed using the nine-tetramer chimallin sheet model. This structure was protonated and placed in a water box through Amber’s tleap module73. The system was neutralized with Na+ using a 12-6 ion model74,75. The CUDA version 10.1 implementation76,77,78 of Amber 20 was used73. The water model used was OPC79 with the Amber 19ffsb force field80. The resulting system, including the protein and water box, contained 1,729,704 atoms. Energy minimization was performed for a total of 10,000 cycles using a combination of steepest descent and conjugate gradient methods76 while the heavy atoms were restrained with a force constant of 10.0 kcal mol−1 Å−2. Next, the system was slowly heated from 10.0 K to 300.0 K over 4 ns before stabilizing at 300.0 K for the next 6 ns using the NVT ensemble with a Langevin thermostat with a friction coefficient (collision frequency) of γ = 5.0 ps−1 and the heavy atoms restrained with a force constant of 1.0 kcal mol−1 Å−2. Equilibration was performed in the NPT ensemble for 20 ns, using a timestep of 2 fs and the SHAKE algorithm, constraining bonds involving hydrogens81. The equilibration temperature was set at 300.0 K with a Langevin thermostat with a friction coefficient82,83 (collision frequency) of γ = 1.0 ps−1 and the pressure set to 1 bar with a Berendsen barostat84 with relaxation time constant 𝜏 = 1.0 ps−1 and a heavy atom restraint with a force constant of 0.1 kcal mol−1 Å−2. Periodic boundary conditions were enforced with the van der Waals interaction cutoff at 8 Å, while long-range interactions were treated with the Particle mesh Ewald algorithm85. After equilibration, the system was cloned into five replicates for the production runs, still set at 300.0 K in the NPT ensemble. Each was run for 300 ns, resulting in 1.5 µs of total sampling.

The resulting molecular dynamics trajectories were analysed through CPPTRAJ-v.25.686 and MDTraj-v1.9.487. In particular, r.m.s.d. was calculated with MDTraj. This was done by calculating the r.m.s.d. of the Cɑ atoms for all tetramers, as well just the central tetramer, from each trajectory and averaging the results (Supplementary Fig. 3).

### Pore analysis

Pore annotation was performed using CHAP-v0.9.188 was used for all other analyses. The free energy and solvent density plots were averaged between physiologically identical pores across all simulation replicates. The inter-tetramer (corner four-fold) pore in the upper-left quadrant contained two frames that caused CHAP to crash; these frames were removed before averaging after consultation with the CHAP developers. Considering we still averaged 1,502 frames × 4 pores − 2 bad frames = 6,006 frames for the inter-tetramer pores, we do not feel that this removal causes any difference in our conclusions.

### Structure visualization and figure generation

Density maps, coordinate models and simulation trajectories were visualized and figures were generated with PyMOL-v2.5 (Schrödinger2021), UCSF Chimera-v1.1535, ChimeraX-v1.2.589, and VMD-1.9.4a3590.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

#Architecture #selfassembly #jumbo #bacteriophage #nuclear #shell #Nature