FreeSAS Package
This is the public API of FreeSAS. Since most of the number crunching part is implemented in Cython, it is not documented here: please refer to the source code.
freesas.align
- class InputModels[source]
Bases:
object
- assign_models(molecule=None)[source]
Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
- Parameters:
molecule – optional 2d array, coordinates of the atoms for the model to create
- Return self.models:
list of SASModel
- rcalculation()[source]
Calculation the maximal value for the R-factors, which is the mean of all the R-factors of inputs plus 2 times the standard deviation. R-factors are saved in the attribute self.rfactors, 1d array, and in percentage.
- Return rmax:
maximal value for the R-factor
- models_selection()[source]
Check if each model respect the limit for the R-factor
- Return self.validmodels:
1d array, 0 for a non valid model, else 1
- rfactorplot(filename=None, save=False)[source]
Create a png file with the table of R factor for each model. A threshold is computed to discarded models with Rfactor>Rmax.
- Parameters:
filename – filename for the figure, default to Rfactor.png
save – save automatically the figure if True, else show it
- Return fig:
the wanted figures
- class AlignModels(files, slow=True, enantiomorphs=True)[source]
Bases:
object
Used to align DAM from pdb files
- assign_models()[source]
Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
- Return self.models:
list of SASModel
- optimize(reference, molecule, symmetry)[source]
Use scipy.optimize to optimize transformation parameters to minimize NSD
- Parameters:
reference – SASmodel
molecule – SASmodel
symmetry – 3-list of +/-1
- Return p:
transformation parameters optimized
- Return dist:
NSD after optimization
- alignment_sym(reference, molecule)[source]
Apply 8 combinations to the molecule and select the one which minimize the distance between it and the reference.
- Parameters:
reference – SASModel, the one which do not move
molecule – SASModel, the one wich has to be aligned
- Return combinaison:
best symmetry to minimize NSD
- Return p:
transformation parameters optimized if slow is true, unoptimized else
- makeNSDarray()[source]
Calculate the NSD correlation table and save it in self.arrayNSD
- Return self.arrayNSD:
2d array, NSD correlation table
- plotNSDarray(rmax=None, filename=None, save=False)[source]
Create a png file with the table of NSD and the average NSD for each model. A threshold is computed to segregate good models and the ones to exclude.
- Parameters:
rmax – threshold of R factor for the validity of a model
filename – filename for the figure, default to nsd.png
save – save automatically the figure if True, else show it
- Return fig:
the wanted figures
- find_reference()[source]
Find the reference model among the models aligned. The reference model is the one with lower average NSD with other models.
- Return ref_number:
position of the reference model in the list self.models
freesas.autorg
Functions for calculating the radius of gyration and forward scattering intensity.
- auto_gpa(data, Rg_min=1.0, qRg_max=1.3, qRg_min=0.5)[source]
Uses the GPA theory to guess quickly Rg, the radius of gyration and I0, the forwards scattering
The theory is described in Guinier peak analysis for visual and automated inspection of small-angle X-ray scattering data Christopher D. Putnam J. Appl. Cryst. (2016). 49, 1412–1419
This fits sqrt(q²Rg²)*exp(-q²Rg²/3)*I0/Rg to the curve I*q = f(q²)
The Guinier region goes arbitrary from 0.5 to 1.3 q·Rg qRg_min and qRg_max can be provided
- Parameters:
data – the raw data read from disc. Only q and I are used.
Rg_min – the minimal accpetable value for the radius of gyration
qRg_max – the default upper bound for the Guinier region.
qRg_min – the default lower bound for the Guinier region.
- Returns:
autRg result with limited information
- auto_guinier(data, Rg_min=1.0, qRg_max=1.3, relax=1.2)[source]
Yet another implementation of the Guinier fit
The idea: * extract the reasonable range * convert to the Guinier space (ln(I) = f(q²) * scan all possible intervall * keep any with qRg_max<1.3 (or 1.5 in relaxed mode) * select the begining and the end of the guinier region according to the contribution of two parameters:
(q_max·Rg - q_min·Rg)/qRg_max –> in favor of large ranges
1 / RMSD –> in favor of good quality data
For each start and end point, the contribution of all ranges are averaged out (using histograms) The best solution is the start/end position with the maximum average.
All ranges within this region are averaged out to measure Rg, I0 and more importantly their deviation.
The quality is still to be calculated
Aggergation is assessed according a second order polynom fit.
- Parameters:
data – 2D array with (q,I,err)
Rg_min – minimum value for Rg
qRg_max – upper bound of the Guinier region
relax – relaxation factor for the upper bound
resolution – step size of the slope histogram
- Returns:
autRg result
freesas.average
- class Grid(inputfiles)[source]
Bases:
object
This class is used to create a grid which include all the input models
- spatial_extent()[source]
Calculate the maximal extent of input models
- Return self.size:
6-list with x,y,z max and then x,y,z min
- class AverModels(inputfiles, grid)[source]
Bases:
object
Provides tools to create an averaged models using several aligned dummy atom models
- read_files(reference=None)[source]
Read all the pdb file in the inputfiles list, creating SASModels. The SASModels created are save in a list, the reference model is the first model in the list.
- Parameters:
reference – position of the reference model file in the inputfiles list
- calc_occupancy(griddot)[source]
Assign an occupancy and a contribution factor to the point of the grid.
- Parameters:
griddot – 1d-array, coordinates of a point of the grid
- Return tuple:
2-tuple containing (occupancy, contribution)
- assign_occupancy()[source]
For each point of the grid, total occupancy and contribution factor are computed and saved. The grid is then ordered with decreasing value of occupancy. The fourth column of the array correspond to the occupancy of the point and the fifth to the contribution for this point.
- Return sortedgrid:
2d-array, coordinates of each point of the grid
freesas.bift
Bayesian Inverse Fourier Transform
This code is the implementation of Steen Hansen J. Appl. Cryst. (2000). 33, 1415-1421
Based on the BIFT from Jesse Hopkins, available at: https://sourceforge.net/p/bioxtasraw/git/ci/master/tree/bioxtasraw/BIFT.py
Many thanks to Pierre Paleo for the auto-alpha guess
- auto_bift(data, Dmax=None, alpha=None, npt=100, start_point=None, end_point=None, scan_size=11, Dmax_over_Rg=3)[source]
Calculates the inverse Fourier tranform of the data using an optimisation of the evidence
- Parameters:
data – 2D array with q, I(q), δI(q). q can be in 1/nm or 1/A, it imposes the unit for r & Dmax
Dmax – Maximum diameter of the object, this is the starting point to be refined. Can be guessed
alpha – Regularisation parameter, let it to None for automatic scan
npt – Number of point for the curve p(r)
start_point – First useable point in the I(q) curve, this is not the start of the Guinier region
end_point – Last useable point in the I(q) curve
scan_size – size of the initial geometrical scan for alpha values.
Dmax_over_Rg – In average, protein’s Dmax is 3x Rg, use this to adjust
- Returns:
BIFT object. Call the get_best to retrieve the optimal solution
freesas.containers
Set of namedtuples defined a bit everywhere
- class RG_RESULT(Rg, sigma_Rg, I0, sigma_I0, start_point, end_point, quality, aggregated)
Bases:
tuple
- I0
Alias for field number 2
- Rg
Alias for field number 0
- aggregated
Alias for field number 7
- end_point
Alias for field number 5
- quality
Alias for field number 6
- sigma_I0
Alias for field number 3
- sigma_Rg
Alias for field number 1
- start_point
Alias for field number 4
- class FIT_RESULT(slope, sigma_slope, intercept, sigma_intercept, R, R2, chi2, RMSD)
Bases:
tuple
- R
Alias for field number 4
- R2
Alias for field number 5
- RMSD
Alias for field number 7
- chi2
Alias for field number 6
- intercept
Alias for field number 2
- sigma_intercept
Alias for field number 3
- sigma_slope
Alias for field number 1
- slope
Alias for field number 0
- class RT_RESULT(Vc, sigma_Vc, Qr, sigma_Qr, mass, sigma_mass)
Bases:
tuple
- Qr
Alias for field number 2
- Vc
Alias for field number 0
- mass
Alias for field number 4
- sigma_Qr
Alias for field number 3
- sigma_Vc
Alias for field number 1
- sigma_mass
Alias for field number 5
- class RadiusKey(Dmax, npt)
Bases:
tuple
- Dmax
Alias for field number 0
- npt
Alias for field number 1
- class PriorKey(type, npt)
Bases:
tuple
- npt
Alias for field number 1
- type
Alias for field number 0
- class TransfoValue(transfo, B, sum_dia)
Bases:
tuple
- B
Alias for field number 1
- sum_dia
Alias for field number 2
- transfo
Alias for field number 0
- class EvidenceKey(Dmax, alpha, npt)
Bases:
tuple
- Dmax
Alias for field number 0
- alpha
Alias for field number 1
- npt
Alias for field number 2
- class EvidenceResult(evidence, chi2r, regularization, radius, density, converged)
Bases:
tuple
- chi2r
Alias for field number 1
- converged
Alias for field number 5
- density
Alias for field number 4
- evidence
Alias for field number 0
- radius
Alias for field number 3
- regularization
Alias for field number 2
- class StatsResult(radius, density_avg, density_std, evidence_avg, evidence_std, Dmax_avg, Dmax_std, alpha_avg, alpha_std, chi2r_avg, chi2r_std, regularization_avg, regularization_std, Rg_avg, Rg_std, I0_avg, I0_std)
Bases:
tuple
- Dmax_avg
Alias for field number 5
- Dmax_std
Alias for field number 6
- I0_avg
Alias for field number 15
- I0_std
Alias for field number 16
- Rg_avg
Alias for field number 13
- Rg_std
Alias for field number 14
- alpha_avg
Alias for field number 7
- alpha_std
Alias for field number 8
- chi2r_avg
Alias for field number 9
- chi2r_std
Alias for field number 10
- density_avg
Alias for field number 1
- density_std
Alias for field number 2
- evidence_avg
Alias for field number 3
- evidence_std
Alias for field number 4
- radius
Alias for field number 0
- regularization_avg
Alias for field number 11
- regularization_std
Alias for field number 12
- save(filename, source=None)
Save the results of the fit to the file
freesas.cormap
- class LongestRunOfHeads[source]
Bases:
object
Implements the “longest run of heads” by Mark F. Schilling The College Mathematics Journal, Vol. 21, No. 3, (1990), pp. 196-207
See: http://www.maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf
- A(n, c)[source]
Calculate A(number_of_toss, length_of_longest_run)
- Parameters:
n – number of coin toss in the experiment, an integer
c – length of the longest run of
- Returns:
The A parameter used in the formula
- B(n, c)[source]
Calculate B(number_of_toss, length_of_longest_run) to have either a run of Heads either a run of Tails
- Parameters:
n – number of coin toss in the experiment, an integer
c – length of the longest run of
- Returns:
The B parameter used in the formula
- probaHeadOrTail(n, c)[source]
Calculate the probability of a longest run of head or tails to occur
- Parameters:
n – number of coin toss in the experiment, an integer
c – length of the longest run of heads or tails, an integer
- Returns:
The probablility of having c subsequent heads or tails in a n toss of fair coin
- probaLongerRun(n, c)[source]
Calculate the probability for the longest run of heads or tails to exceed the observed length
- Parameters:
n – number of coin toss in the experiment, an integer
c – length of thee observed run of heads or tails, an integer
- Returns:
The probablility of having more than c subsequent heads or tails in a n toss of fair coin
- gof(data1, data2)[source]
Calculate the probability for a couple of dataset to be equivalent
Implementation according to: http://www.nature.com/nmeth/journal/v12/n5/full/nmeth.3358.html
- Parameters:
data1 – numpy array
data2 – numpy array
- Returns:
probablility for the 2 data to be equivalent
freesas.invariants
This module is mainly about the calculation of the Rambo-Tainer invariant described in:
https://dx.doi.org/10.1038%2Fnature12070
Some formula taken from Putnam et al, 2007, Table 1 in the review
- extrapolate(data, guinier)[source]
Extrapolate SAS data according to the Guinier fit until q=0 Uncertainties are extrapolated (linearly) from the Guinier region
- Parameters:
data – SAS data in q,I,dI format
guinier – result of a Guinier fit
- Returns:
extrapolated SAS data
- calc_Porod(data, guinier)[source]
Calculate the particle volume according to Porod’s formula:
V = 2*π²I₀²/(sum_q I(q)q² dq)
Formula from Putnam’s review, 2007, table 1 Intensities are extrapolated to q=0 using Guinier fit.
- Parameters:
data – SAS data in q,I,dI format
Guinier – result of a Guinier fit (instance of RT_RESULT)
- Returns:
Volume calculated according to Porrod’s formula
- calc_Vc(data, Rg, dRg, I0, dI0, imin)[source]
Calculates the Rambo-Tainer invariant Vc, including extrapolation to q=0
- Parameters:
data – SAS data in q,I,dI format, cropped to maximal q that should be used for calculation (normally 2 nm-1)
Rg,dRg,I0,dI0 – results from Guinier approximation/autorg
imin – minimal index of the Guinier range, below that index data will be extrapolated by the Guinier approximation
- Returns:
Vc and an error estimate based on non-correlated error propagation
- calc_Rambo_Tainer(data, guinier, qmax=2.0)[source]
calculates the invariants Vc and Qr from the Rambo & Tainer 2013 Paper, also the the mass estimate based on Qr for proteins
- Parameters:
data – data in q,I,dI format, q in nm^-1
guinier – RG_RESULT instance with result from the Guinier fit
qmax – maximum q-value for the calculation in nm^-1
@return: dict with Vc, Qr and mass plus errors
freesas.model
- delta_expand(vec1, vec2)[source]
Create a 2d array with the difference vec1[i]-vec2[j]
- Parameters:
vec2 (vec1,) – 1d-array
- Return v1 - v2:
difference for any element of v1 and v2 (i.e a 2D array)
- class SASModel(molecule=None)[source]
Bases:
object
Tools for Dummy Atoms Model manipulation
- read(filename)[source]
Read the PDB file, extract coordinates of each dummy atom, extract the R-factor of the model, coordinates of each dummy atom and pdb file header.
- Parameters:
filename – name of the pdb file to read
- save(filename)[source]
Save the position of each dummy atom in a PDB file.
- Parameters:
filename – name of the pdb file to write
- centroid()[source]
Calculate the position of the center of mass of the molecule.
- Return self.com:
1d array, coordinates of the center of mass of the molecule
- inertiatensor()[source]
calculate the inertia tensor of the protein
- Return self.inertensor:
inertia tensor of the molecule
- canonical_translate()[source]
Calculate the translation matrix to translate the center of mass of the molecule on the origin of the base.
- Return trans:
translation matrix
- canonical_rotate()[source]
Calculate the rotation matrix to align inertia momentum of the molecule on principal axis.
- Return rot:
rotation matrix det==1
- canonical_parameters()[source]
Save the 6 canonical parameters of the initial molecule: x0, y0, z0, the position of the center of mass phi, theta, psi, the three Euler angles of the canonical rotation (axis:x,y’,z’’)
- calc_invariants(use_cython=True)[source]
- Calculate the invariants of the structure:
fineness, ie. average distance between an atoms and its nearest neighbor
radius of gyration of the model
diameter of the model
- Return invariants:
3-tuple containing (fineness, Rg, Dmax)
- property fineness
- property Rg
- property Dmax
- dist(other, molecule1, molecule2, use_cython=True)[source]
Calculate the distance with another model.
- Parameters:
self,other – two SASModel
molecule1 – 2d array of the position of each atom of the first molecule
molecule2 – 2d array of the position of each atom of the second molecule
- Return D:
NSD between the 2 molecules, in their position molecule1 and molecule2
- transform(param, symmetry, reverse=None)[source]
Calculate the new coordinates of each dummy atoms of the molecule after a transformation defined by six parameters and a symmetry
- Parameters:
param – 6 parameters of transformation (3 coordinates of translation, 3 Euler angles)
symmetry – list of three constants which define a symmetry to apply
- Return mol:
2d array, coordinates after transformation
- dist_after_movement(param, other, symmetry)[source]
The first molecule, molref, is put on its canonical position. The second one, mol2, is moved following the transformation selected
- Parameters:
param – list of 6 parameters for the transformation, 3 coordinates of translation and 3 Euler angles
symmetry – list of three constants which define a symmetry to apply
- Return distance:
the NSD between the first molecule and the second one after its movement
freesas.plot
Functions to generating graphs related to SAS.
- scatter_plot(data, guinier=None, ift=None, filename=None, img_format='svg', unit='nm', title='Scattering curve', ax=None, labelsize=None, fontsize=None)[source]
Generate a scattering plot I = f(q) in semi_log_y.
- Parameters:
data – data read from an ASCII file, 3 column (q,I,err)
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
guinier – output of autoRg
ift – converged instance of BIFT (output of auto_bift)
ax – subplot where the plot shall go in
- Returns:
the matplotlib figure
- kratky_plot(data, guinier, filename=None, img_format='svg', unit='nm', title='Dimensionless Kratky plot', ax=None, labelsize=None, fontsize=None)[source]
Generate a Kratky plot q²Rg²I/I₀ = f(q·Rg)
- Parameters:
data – data read from an ASCII file, 3 column (q,I,err)
guinier – output of autoRg
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
ax – subplot where the plot shall go in
- Returns:
the matplotlib figure
- guinier_plot(data, guinier, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]
Generate a guinier plot: ln(I) = f(q²)
- Parameters:
data – data read from an ASCII file, 3 column (q,I,err)
guinier – A RG_RESULT object from AutoRg
filename – name of the file where the cuve should be saved
img_format – image format
- Param:
ax: subplot where to plot in
- Returns:
the matplotlib figure
- density_plot(ift, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]
Generate a density plot p(r)
- Parameters:
ift – An IFT result comming out of BIFT
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in
- Returns:
the matplotlib figure
- hplc_plot(hplc, fractions=None, title='Chromatogram', filename=None, img_format='png', ax=None, labelsize=None, fontsize=None)[source]
Generate an HPLC plot I=f(t)
- Parameters:
hplc – stack of diffraction data
fractions – list of 2tuple with first and last ndex if each fraction
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in
- Returns:
the matplotlib figure
freesas.sasio
Contains helper functions for loading SAS data from differents sources.
- load_scattering_data(filename: PathLike | str | bytes | StringIO | BytesIO) ndarray [source]
Load scattering data q, I, err into a numpy array.
- Parameters:
filename – ASCII file, 3 column (q,I,err)
- Returns:
numpy array with 3 column (q,I,err)
- parse_ascii_data(input_file_text: List[str], number_of_columns: int) ndarray [source]
Parse data from an ascii file into an N column numpy array
- Parameters:
input_file_text – List containing one line of input data per element
number_of_columns – Expected number of lines in the data file
- Returns:
numpy array with 3 column (q,I,err)