FreeSAS Package

This is the public API of FreeSAS. Since most of the number crunching part is implemented in Cython, it is not documented here: please refer to the source code.

freesas.align

class InputModels[source]

Bases: object

assign_models(molecule=None)[source]

Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.

Parameters:: molecule – optional 2d array, coordinates of the atoms for the model to create
Return self.models:: list of SASModel

rcalculation()[source]

Calculation the maximal value for the R-factors, which is the mean of all the R-factors of inputs plus 2 times the standard deviation. R-factors are saved in the attribute self.rfactors, 1d array, and in percentage.

Return rmax:: maximal value for the R-factor

models_selection()[source]

Check if each model respect the limit for the R-factor

Return self.validmodels:: 1d array, 0 for a non valid model, else 1

rfactorplot(filename=None, save=False)[source]

Create a png file with the table of R factor for each model. A threshold is computed to discarded models with Rfactor>Rmax.

Parameters:

filename – filename for the figure, default to Rfactor.png
save – save automatically the figure if True, else show it

Return fig:

the wanted figures

class AlignModels(files, slow=True, enantiomorphs=True)[source]

Bases: object

Used to align DAM from pdb files

assign_models()[source]

Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.

Return self.models:: list of SASModel

optimize(reference, molecule, symmetry)[source]

Use scipy.optimize to optimize transformation parameters to minimize NSD

Parameters:

reference – SASmodel
molecule – SASmodel
symmetry – 3-list of +/-1

Return p:

transformation parameters optimized

Return dist:

NSD after optimization

alignment_sym(reference, molecule)[source]

Apply 8 combinations to the molecule and select the one which minimize the distance between it and the reference.

Parameters:

reference – SASModel, the one which do not move
molecule – SASModel, the one wich has to be aligned

Return combinaison:

best symmetry to minimize NSD

Return p:

transformation parameters optimized if slow is true, unoptimized else

makeNSDarray()[source]

Calculate the NSD correlation table and save it in self.arrayNSD

Return self.arrayNSD:: 2d array, NSD correlation table

plotNSDarray(rmax=None, filename=None, save=False)[source]

Create a png file with the table of NSD and the average NSD for each model. A threshold is computed to segregate good models and the ones to exclude.

Parameters:

rmax – threshold of R factor for the validity of a model
filename – filename for the figure, default to nsd.png
save – save automatically the figure if True, else show it

Return fig:

the wanted figures

find_reference()[source]

Find the reference model among the models aligned. The reference model is the one with lower average NSD with other models.

Return ref_number:: position of the reference model in the list self.models

alignment_reference(ref_number=None)[source]: Align all models in self.models with the reference one. The aligned models are saved in pdb files (names in list self.outputfiles)

alignment_2models(save=True)[source]

Align two models using the first one as reference. The aligned models are save in pdb files.

Return dist:: NSD after alignment

freesas.autorg

Functions for calculating the radius of gyration and forward scattering intensity.

auto_gpa(data, Rg_min=1.0, qRg_max=1.3, qRg_min=0.5)[source]

Uses the GPA theory to guess quickly Rg, the radius of gyration and I0, the forwards scattering

The theory is described in Guinier peak analysis for visual and automated inspection of small-angle X-ray scattering data Christopher D. Putnam J. Appl. Cryst. (2016). 49, 1412–1419

This fits sqrt(q²Rg²)*exp(-q²Rg²/3)*I0/Rg to the curve I*q = f(q²)

The Guinier region goes arbitrary from 0.5 to 1.3 q·Rg qRg_min and qRg_max can be provided

Parameters:

data – the raw data read from disc. Only q and I are used.
Rg_min – the minimal accpetable value for the radius of gyration
qRg_max – the default upper bound for the Guinier region.
qRg_min – the default lower bound for the Guinier region.

Returns:

autRg result with limited information

auto_guinier(data, Rg_min=1.0, qRg_max=1.3, relax=1.2)[source]

Yet another implementation of the Guinier fit

The idea: * extract the reasonable range * convert to the Guinier space (ln(I) = f(q²) * scan all possible intervall * keep any with qRg_max<1.3 (or 1.5 in relaxed mode) * select the begining and the end of the guinier region according to the contribution of two parameters:

(q_max·Rg - q_min·Rg)/qRg_max –> in favor of large ranges

1 / RMSD –> in favor of good quality data

For each start and end point, the contribution of all ranges are averaged out (using histograms) The best solution is the start/end position with the maximum average.

All ranges within this region are averaged out to measure Rg, I0 and more importantly their deviation.
The quality is still to be calculated
Aggergation is assessed according a second order polynom fit.

Parameters:

data – 2D array with (q,I,err)
Rg_min – minimum value for Rg
qRg_max – upper bound of the Guinier region
relax – relaxation factor for the upper bound
resolution – step size of the slope histogram

Returns:

autRg result

freesas.average

class Grid(inputfiles)[source]

Bases: object

This class is used to create a grid which include all the input models

spatial_extent()[source]

Calculate the maximal extent of input models

Return self.size:: 6-list with x,y,z max and then x,y,z min

calc_radius(nbknots=None)[source]

Calculate the radius of each point of a hexagonal close-packed grid, knowing the total volume and the number of knots in this grid.

Parameters:: nbknots – number of knots wanted for the grid
Return radius:: the radius of each knot of the grid

make_grid()[source]

Create a grid using the maximal size and the radius previously computed. The geometry used is a face-centered cubic lattice (fcc).

Return knots:: 2d-array, coordinates of each dot of the grid. Saved as self.coordknots.

class AverModels(inputfiles, grid)[source]

Bases: object

Provides tools to create an averaged models using several aligned dummy atom models

read_files(reference=None)[source]

Read all the pdb file in the inputfiles list, creating SASModels. The SASModels created are save in a list, the reference model is the first model in the list.

Parameters:: reference – position of the reference model file in the inputfiles list

calc_occupancy(griddot)[source]

Assign an occupancy and a contribution factor to the point of the grid.

Parameters:: griddot – 1d-array, coordinates of a point of the grid
Return tuple:: 2-tuple containing (occupancy, contribution)

assign_occupancy()[source]

For each point of the grid, total occupancy and contribution factor are computed and saved. The grid is then ordered with decreasing value of occupancy. The fourth column of the array correspond to the occupancy of the point and the fifth to the contribution for this point.

Return sortedgrid:: 2d-array, coordinates of each point of the grid

make_header()[source]: Create the layout of the pdb file for the averaged model.

save_aver(filename)[source]

Save the position of each occupied dot of the grid, its occupancy and its contribution in a pdb file.

Parameters:: filename – name of the pdb file to write

freesas.bift

Bayesian Inverse Fourier Transform

This code is the implementation of Steen Hansen J. Appl. Cryst. (2000). 33, 1415-1421

Based on the BIFT from Jesse Hopkins, available at: https://sourceforge.net/p/bioxtasraw/git/ci/master/tree/bioxtasraw/BIFT.py

Many thanks to Pierre Paleo for the auto-alpha guess

auto_bift(data, Dmax=None, alpha=None, npt=100, start_point=None, end_point=None, scan_size=11, Dmax_over_Rg=3)[source]

Calculates the inverse Fourier tranform of the data using an optimisation of the evidence

Parameters:

data – 2D array with q, I(q), δI(q). q can be in 1/nm or 1/A, it imposes the unit for r & Dmax
Dmax – Maximum diameter of the object, this is the starting point to be refined. Can be guessed
alpha – Regularisation parameter, let it to None for automatic scan
npt – Number of point for the curve p(r)
start_point – First useable point in the I(q) curve, this is not the start of the Guinier region
end_point – Last useable point in the I(q) curve
scan_size – size of the initial geometrical scan for alpha values.
Dmax_over_Rg – In average, protein’s Dmax is 3x Rg, use this to adjust

Returns:

BIFT object. Call the get_best to retrieve the optimal solution

freesas.containers

Set of namedtuples defined a bit everywhere

class RG_RESULT(Rg, sigma_Rg, I0, sigma_I0, start_point, end_point, quality, aggregated)

Bases: tuple

I0: Alias for field number 2

Rg: Alias for field number 0

aggregated: Alias for field number 7

end_point: Alias for field number 5

quality: Alias for field number 6

sigma_I0: Alias for field number 3

sigma_Rg: Alias for field number 1

start_point: Alias for field number 4

class FIT_RESULT(slope, sigma_slope, intercept, sigma_intercept, R, R2, chi2, RMSD)

Bases: tuple

R: Alias for field number 4

R2: Alias for field number 5

RMSD: Alias for field number 7

chi2: Alias for field number 6

intercept: Alias for field number 2

sigma_intercept: Alias for field number 3

sigma_slope: Alias for field number 1

slope: Alias for field number 0

class RT_RESULT(Vc, sigma_Vc, Qr, sigma_Qr, mass, sigma_mass)

Bases: tuple

Qr: Alias for field number 2

Vc: Alias for field number 0

mass: Alias for field number 4

sigma_Qr: Alias for field number 3

sigma_Vc: Alias for field number 1

sigma_mass: Alias for field number 5

class RadiusKey(Dmax, npt)

Bases: tuple

Dmax: Alias for field number 0

npt: Alias for field number 1

class PriorKey(type, npt)

Bases: tuple

npt: Alias for field number 1

type: Alias for field number 0

class TransfoValue(transfo, B, sum_dia)

Bases: tuple

B: Alias for field number 1

sum_dia: Alias for field number 2

transfo: Alias for field number 0

class EvidenceKey(Dmax, alpha, npt)

Bases: tuple

Dmax: Alias for field number 0

alpha: Alias for field number 1

npt: Alias for field number 2

class EvidenceResult(evidence, chi2r, regularization, radius, density, converged)

Bases: tuple

chi2r: Alias for field number 1

converged: Alias for field number 5

density: Alias for field number 4

evidence: Alias for field number 0

radius: Alias for field number 3

regularization: Alias for field number 2

class StatsResult(radius, density_avg, density_std, evidence_avg, evidence_std, Dmax_avg, Dmax_std, alpha_avg, alpha_std, chi2r_avg, chi2r_std, regularization_avg, regularization_std, Rg_avg, Rg_std, I0_avg, I0_std)

Bases: tuple

Dmax_avg: Alias for field number 5

Dmax_std: Alias for field number 6

I0_avg: Alias for field number 15

I0_std: Alias for field number 16

Rg_avg: Alias for field number 13

Rg_std: Alias for field number 14

alpha_avg: Alias for field number 7

alpha_std: Alias for field number 8

chi2r_avg: Alias for field number 9

chi2r_std: Alias for field number 10

density_avg: Alias for field number 1

density_std: Alias for field number 2

evidence_avg: Alias for field number 3

evidence_std: Alias for field number 4

radius: Alias for field number 0

regularization_avg: Alias for field number 11

regularization_std: Alias for field number 12

save(filename, source=None): Save the results of the fit to the file

save_bift(stats, filename, source=None)[source]: Save the results of the fit to the file

class GOF(n, c, P)

Bases: tuple

P: Alias for field number 2

c: Alias for field number 1

n: Alias for field number 0

freesas.cormap

class LongestRunOfHeads[source]

Bases: object

Implements the “longest run of heads” by Mark F. Schilling The College Mathematics Journal, Vol. 21, No. 3, (1990), pp. 196-207

See: http://www.maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf

A(n, c)[source]

Calculate A(number_of_toss, length_of_longest_run)

Parameters:

n – number of coin toss in the experiment, an integer
c – length of the longest run of

Returns:

The A parameter used in the formula

B(n, c)[source]

Calculate B(number_of_toss, length_of_longest_run) to have either a run of Heads either a run of Tails

Parameters:

n – number of coin toss in the experiment, an integer
c – length of the longest run of

Returns:

The B parameter used in the formula

probaHeadOrTail(n, c)[source]

Calculate the probability of a longest run of head or tails to occur

Parameters:

n – number of coin toss in the experiment, an integer
c – length of the longest run of heads or tails, an integer

Returns:

The probablility of having c subsequent heads or tails in a n toss of fair coin

probaLongerRun(n, c)[source]

Calculate the probability for the longest run of heads or tails to exceed the observed length

Parameters:

n – number of coin toss in the experiment, an integer
c – length of thee observed run of heads or tails, an integer

Returns:

The probablility of having more than c subsequent heads or tails in a n toss of fair coin

gof(data1, data2)[source]

Calculate the probability for a couple of dataset to be equivalent

Implementation according to: http://www.nature.com/nmeth/journal/v12/n5/full/nmeth.3358.html

Parameters:

data1 – numpy array
data2 – numpy array

Returns:

probablility for the 2 data to be equivalent

freesas.invariants

This module is mainly about the calculation of the Rambo-Tainer invariant described in:

https://dx.doi.org/10.1038%2Fnature12070

Some formula taken from Putnam et al, 2007, Table 1 in the review

extrapolate(data, guinier)[source]

Extrapolate SAS data according to the Guinier fit until q=0 Uncertainties are extrapolated (linearly) from the Guinier region

Parameters:

data – SAS data in q,I,dI format
guinier – result of a Guinier fit

Returns:

extrapolated SAS data

calc_Porod(data, guinier)[source]

Calculate the particle volume according to Porod’s formula:

V = 2*π²I₀²/(sum_q I(q)q² dq)

Formula from Putnam’s review, 2007, table 1 Intensities are extrapolated to q=0 using Guinier fit.

Parameters:

data – SAS data in q,I,dI format
Guinier – result of a Guinier fit (instance of RT_RESULT)

Returns:

Volume calculated according to Porrod’s formula

calc_Vc(data, Rg, dRg, I0, dI0, imin)[source]

Calculates the Rambo-Tainer invariant Vc, including extrapolation to q=0

Parameters:

data – SAS data in q,I,dI format, cropped to maximal q that should be used for calculation (normally 2 nm-1)
Rg,dRg,I0,dI0 – results from Guinier approximation/autorg
imin – minimal index of the Guinier range, below that index data will be extrapolated by the Guinier approximation

Returns:

Vc and an error estimate based on non-correlated error propagation

calc_Rambo_Tainer(data, guinier, qmax=2.0)[source]

calculates the invariants Vc and Qr from the Rambo & Tainer 2013 Paper, also the the mass estimate based on Qr for proteins

Parameters:

data – data in q,I,dI format, q in nm^-1
guinier – RG_RESULT instance with result from the Guinier fit
qmax – maximum q-value for the calculation in nm^-1

@return: dict with Vc, Qr and mass plus errors

freesas.model

delta_expand(vec1, vec2)[source]

Create a 2d array with the difference vec1[i]-vec2[j]

Parameters:: vec2 (vec1,) – 1d-array
Return v1 - v2:: difference for any element of v1 and v2 (i.e a 2D array)

class SASModel(molecule=None)[source]

Bases: object

Tools for Dummy Atoms Model manipulation

read(filename)[source]

Read the PDB file, extract coordinates of each dummy atom, extract the R-factor of the model, coordinates of each dummy atom and pdb file header.

Parameters:: filename – name of the pdb file to read

save(filename)[source]

Save the position of each dummy atom in a PDB file.

Parameters:: filename – name of the pdb file to write

centroid()[source]

Calculate the position of the center of mass of the molecule.

Return self.com:: 1d array, coordinates of the center of mass of the molecule

inertiatensor()[source]

calculate the inertia tensor of the protein

Return self.inertensor:: inertia tensor of the molecule

canonical_translate()[source]

Calculate the translation matrix to translate the center of mass of the molecule on the origin of the base.

Return trans:: translation matrix

canonical_rotate()[source]

Calculate the rotation matrix to align inertia momentum of the molecule on principal axis.

Return rot:: rotation matrix det==1

canonical_parameters()[source]: Save the 6 canonical parameters of the initial molecule: x0, y0, z0, the position of the center of mass phi, theta, psi, the three Euler angles of the canonical rotation (axis:x,y’,z’’)

calc_invariants(use_cython=True)[source]

Calculate the invariants of the structure:

fineness, ie. average distance between an atoms and its nearest neighbor
radius of gyration of the model
diameter of the model

Return invariants:: 3-tuple containing (fineness, Rg, Dmax)

property fineness

property Rg

property Dmax

dist(other, molecule1, molecule2, use_cython=True)[source]

Calculate the distance with another model.

Parameters:

self,other – two SASModel
molecule1 – 2d array of the position of each atom of the first molecule
molecule2 – 2d array of the position of each atom of the second molecule

Return D:

NSD between the 2 molecules, in their position molecule1 and molecule2

transform(param, symmetry, reverse=None)[source]

Calculate the new coordinates of each dummy atoms of the molecule after a transformation defined by six parameters and a symmetry

Parameters:

param – 6 parameters of transformation (3 coordinates of translation, 3 Euler angles)
symmetry – list of three constants which define a symmetry to apply

Return mol:

2d array, coordinates after transformation

dist_after_movement(param, other, symmetry)[source]

The first molecule, molref, is put on its canonical position. The second one, mol2, is moved following the transformation selected

Parameters:

param – list of 6 parameters for the transformation, 3 coordinates of translation and 3 Euler angles
symmetry – list of three constants which define a symmetry to apply

Return distance:

the NSD between the first molecule and the second one after its movement

freesas.plot

Functions to generating graphs related to SAS.

scatter_plot(data, guinier=None, ift=None, filename=None, img_format='svg', unit='nm', title='Scattering curve', ax=None, labelsize=None, fontsize=None)[source]

Generate a scattering plot I = f(q) in semi_log_y.

Parameters:

data – data read from an ASCII file, 3 column (q,I,err)
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
guinier – output of autoRg
ift – converged instance of BIFT (output of auto_bift)
ax – subplot where the plot shall go in

Returns:

the matplotlib figure

kratky_plot(data, guinier, filename=None, img_format='svg', unit='nm', title='Dimensionless Kratky plot', ax=None, labelsize=None, fontsize=None)[source]

Generate a Kratky plot q²Rg²I/I₀ = f(q·Rg)

Parameters:

data – data read from an ASCII file, 3 column (q,I,err)
guinier – output of autoRg
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
ax – subplot where the plot shall go in

Returns:

the matplotlib figure

guinier_plot(data, guinier, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]

Generate a guinier plot: ln(I) = f(q²)

Parameters:

data – data read from an ASCII file, 3 column (q,I,err)
guinier – A RG_RESULT object from AutoRg
filename – name of the file where the cuve should be saved
img_format – image format

Param:

ax: subplot where to plot in

Returns:

the matplotlib figure

density_plot(ift, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]

Generate a density plot p(r)

Parameters:

ift – An IFT result comming out of BIFT
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in

Returns:

the matplotlib figure

plot_all(data, filename=None, img_format=None, unit='nm', labelsize=None, fontsize=None)[source]

hplc_plot(hplc, fractions=None, title='Chromatogram', filename=None, img_format='png', ax=None, labelsize=None, fontsize=None)[source]

Generate an HPLC plot I=f(t)

Parameters:

hplc – stack of diffraction data
fractions – list of 2tuple with first and last ndex if each fraction
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in

Returns:

the matplotlib figure

freesas.sasio

Contains helper functions for loading SAS data from differents sources.

load_scattering_data(filename: PathLike | str | bytes | StringIO | BytesIO) → ndarray[source]

Load scattering data q, I, err into a numpy array.

Parameters:: filename – ASCII file, 3 column (q,I,err)
Returns:: numpy array with 3 column (q,I,err)

parse_ascii_data(input_file_text: List[str], number_of_columns: int) → ndarray[source]

Parse data from an ascii file into an N column numpy array

Parameters:

input_file_text – List containing one line of input data per element
number_of_columns – Expected number of lines in the data file

Returns:

numpy array with 3 column (q,I,err)

convert_inverse_angstrom_to_nanometer(data_in_inverse_angstrom: ndarray) → ndarray[source]

Convert data with q in 1/Å to 1/nm.

Parameters:: data_in_inverse_angstrom – numpy array in format (q_in_inverse_Angstrom,I,err)
Returns:: numpy array with 3 column (q_in_inverse_nm,I,err)