FreeSAS Package¶
This is the public API of FreeSAS. Since most of the number crunching part is implemented in Cython, it is not documented here: please refer to the source code.
freesas.align¶
-
class
InputModels
[source]¶ Bases:
object
-
assign_models
(molecule=None)[source]¶ Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
Parameters: molecule – optional 2d array, coordinates of the atoms for the model to create Return self.models: list of SASModel
-
rcalculation
()[source]¶ Calculation the maximal value for the R-factors, which is the mean of all the R-factors of inputs plus 2 times the standard deviation. R-factors are saved in the attribute self.rfactors, 1d array, and in percentage.
Return rmax: maximal value for the R-factor
-
models_selection
()[source]¶ Check if each model respect the limit for the R-factor
Return self.validmodels: 1d array, 0 for a non valid model, else 1
-
rfactorplot
(filename=None, save=False)[source]¶ Create a png file with the table of R factor for each model. A threshold is computed to discarded models with Rfactor>Rmax.
Parameters: - filename – filename for the figure, default to Rfactor.png
- save – save automatically the figure if True, else show it
Return fig: the wanted figures
-
-
class
AlignModels
(files, slow=True, enantiomorphs=True)[source]¶ Bases:
object
Used to align DAM from pdb files
-
assign_models
()[source]¶ Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
Return self.models: list of SASModel
-
optimize
(reference, molecule, symmetry)[source]¶ Use scipy.optimize to optimize transformation parameters to minimize NSD
Parameters: - reference – SASmodel
- molecule – SASmodel
- symmetry – 3-list of +/-1
Return p: transformation parameters optimized
Return dist: NSD after optimization
-
alignment_sym
(reference, molecule)[source]¶ Apply 8 combinations to the molecule and select the one which minimize the distance between it and the reference.
Parameters: - reference – SASModel, the one which do not move
- molecule – SASModel, the one wich has to be aligned
Return combinaison: best symmetry to minimize NSD
Return p: transformation parameters optimized if slow is true, unoptimized else
-
makeNSDarray
()[source]¶ Calculate the NSD correlation table and save it in self.arrayNSD
Return self.arrayNSD: 2d array, NSD correlation table
-
plotNSDarray
(rmax=None, filename=None, save=False)[source]¶ Create a png file with the table of NSD and the average NSD for each model. A threshold is computed to segregate good models and the ones to exclude.
Parameters: - rmax – threshold of R factor for the validity of a model
- filename – filename for the figure, default to nsd.png
- save – save automatically the figure if True, else show it
Return fig: the wanted figures
-
find_reference
()[source]¶ Find the reference model among the models aligned. The reference model is the one with lower average NSD with other models.
Return ref_number: position of the reference model in the list self.models
-
freesas.autorg¶
Functions to generating graphs related to
-
auto_gpa
(data, Rg_min=1.0, qRg_max=1.3, qRg_min=0.5)[source]¶ Uses the GPA theory to guess quickly Rg, the radius of gyration and I0, the forwards scattering
The theory is described in Guinier peak analysis for visual and automated inspection of small-angle X-ray scattering data Christopher D. Putnam J. Appl. Cryst. (2016). 49, 1412–1419
This fits sqrt(q²Rg²)*exp(-q²Rg²/3)*I0/Rg to the curve I*q = f(q²)
The Guinier region goes arbitrary from 0.5 to 1.3 q·Rg qRg_min and qRg_max can be provided
Parameters: - data – the raw data read from disc. Only q and I are used.
- Rg_min – the minimal accpetable value for the radius of gyration
- qRg_max – the default upper bound for the Guinier region.
- qRg_min – the default lower bound for the Guinier region.
Returns: autRg result with limited information
-
auto_guinier
(data, Rg_min=1.0, qRg_max=1.3, relax=1.2)[source]¶ Yet another implementation of the Guinier fit
The idea: * extract the reasonable range * convert to the Guinier space (ln(I) = f(q²) * scan all possible intervall * keep any with qRg_max<1.3 (or 1.5 in relaxed mode) * select the begining and the end of the guinier region according to the contribution of two parameters:
- (q_max·Rg - q_min·Rg)/qRg_max –> in favor of large ranges
- 1 / RMSD –> in favor of good quality data
For each start and end point, the contribution of all ranges are averaged out (using histograms) The best solution is the start/end position with the maximum average.
- All ranges within this region are averaged out to measure Rg, I0 and more importantly their deviation.
- The quality is still to be calculated
- Aggergation is assessed according a second order polynom fit.
Parameters: - data – 2D array with (q,I,err)
- Rg_min – minimum value for Rg
- qRg_max – upper bound of the Guinier region
- relax – relaxation factor for the upper bound
- resolution – step size of the slope histogram
Returns: autRg result
freesas.average¶
-
class
Grid
(inputfiles)[source]¶ Bases:
object
This class is used to create a grid which include all the input models
-
spatial_extent
()[source]¶ Calculate the maximal extent of input models
Return self.size: 6-list with x,y,z max and then x,y,z min
-
-
class
AverModels
(inputfiles, grid)[source]¶ Bases:
object
Provides tools to create an averaged models using several aligned dummy atom models
-
read_files
(reference=None)[source]¶ Read all the pdb file in the inputfiles list, creating SASModels. The SASModels created are save in a list, the reference model is the first model in the list.
Parameters: reference – position of the reference model file in the inputfiles list
-
calc_occupancy
(griddot)[source]¶ Assign an occupancy and a contribution factor to the point of the grid.
Parameters: griddot – 1d-array, coordinates of a point of the grid Return tuple: 2-tuple containing (occupancy, contribution)
-
assign_occupancy
()[source]¶ For each point of the grid, total occupancy and contribution factor are computed and saved. The grid is then ordered with decreasing value of occupancy. The fourth column of the array correspond to the occupancy of the point and the fifth to the contribution for this point.
Return sortedgrid: 2d-array, coordinates of each point of the grid
-
freesas.bift¶
Bayesian Inverse Fourier Transform
This code is the implementation of Steen Hansen J. Appl. Cryst. (2000). 33, 1415-1421
Based on the BIFT from Jesse Hopkins, available at: https://sourceforge.net/p/bioxtasraw/git/ci/master/tree/bioxtasraw/BIFT.py
Many thanks to Pierre Paleo for the auto-alpha guess
-
auto_bift
(data, Dmax=None, alpha=None, npt=100, start_point=None, end_point=None, scan_size=11, Dmax_over_Rg=3)[source]¶ Calculates the inverse Fourier tranform of the data using an optimisation of the evidence
Parameters: - data – 2D array with q, I(q), δI(q). q can be in 1/nm or 1/A, it imposes the unit for r & Dmax
- Dmax – Maximum diameter of the object, this is the starting point to be refined. Can be guessed
- alpha – Regularisation parameter, let it to None for automatic scan
- npt – Number of point for the curve p(r)
- start_point – First useable point in the I(q) curve, this is not the start of the Guinier region
- end_point – Last useable point in the I(q) curve
- scan_size – size of the initial geometrical scan for alpha values.
- Dmax_over_Rg – In average, protein’s Dmax is 3x Rg, use this to adjust
Returns: BIFT object. Call the get_best to retrieve the optimal solution
freesas.collections¶
Set of namedtuples defined a bit everywhere
-
class
RG_RESULT
(Rg, sigma_Rg, I0, sigma_I0, start_point, end_point, quality, aggregated)¶ Bases:
tuple
-
I0
¶ Alias for field number 2
-
Rg
¶ Alias for field number 0
-
aggregated
¶ Alias for field number 7
-
end_point
¶ Alias for field number 5
-
quality
¶ Alias for field number 6
-
sigma_I0
¶ Alias for field number 3
-
sigma_Rg
¶ Alias for field number 1
-
start_point
¶ Alias for field number 4
-
-
class
FIT_RESULT
(slope, sigma_slope, intercept, sigma_intercept, R, R2, chi2, RMSD)¶ Bases:
tuple
-
R
¶ Alias for field number 4
-
R2
¶ Alias for field number 5
-
RMSD
¶ Alias for field number 7
-
chi2
¶ Alias for field number 6
-
intercept
¶ Alias for field number 2
-
sigma_intercept
¶ Alias for field number 3
-
sigma_slope
¶ Alias for field number 1
-
slope
¶ Alias for field number 0
-
-
class
RT_RESULT
(Vc, sigma_Vc, Qr, sigma_Qr, mass, sigma_mass)¶ Bases:
tuple
-
Qr
¶ Alias for field number 2
-
Vc
¶ Alias for field number 0
-
mass
¶ Alias for field number 4
-
sigma_Qr
¶ Alias for field number 3
-
sigma_Vc
¶ Alias for field number 1
-
sigma_mass
¶ Alias for field number 5
-
-
class
RadiusKey
(Dmax, npt)¶ Bases:
tuple
-
Dmax
¶ Alias for field number 0
-
npt
¶ Alias for field number 1
-
-
class
PriorKey
(type, npt)¶ Bases:
tuple
-
npt
¶ Alias for field number 1
-
type
¶ Alias for field number 0
-
-
class
TransfoValue
(transfo, B, sum_dia)¶ Bases:
tuple
-
B
¶ Alias for field number 1
-
sum_dia
¶ Alias for field number 2
-
transfo
¶ Alias for field number 0
-
-
class
EvidenceKey
(Dmax, alpha, npt)¶ Bases:
tuple
-
Dmax
¶ Alias for field number 0
-
alpha
¶ Alias for field number 1
-
npt
¶ Alias for field number 2
-
-
class
EvidenceResult
(evidence, chi2r, regularization, radius, density, converged)¶ Bases:
tuple
-
chi2r
¶ Alias for field number 1
-
converged
¶ Alias for field number 5
-
density
¶ Alias for field number 4
-
evidence
¶ Alias for field number 0
-
radius
¶ Alias for field number 3
-
regularization
¶ Alias for field number 2
-
-
class
StatsResult
(radius, density_avg, density_std, evidence_avg, evidence_std, Dmax_avg, Dmax_std, alpha_avg, alpha_std, chi2r_avg, chi2r_std, regularization_avg, regularization_std, Rg_avg, Rg_std, I0_avg, I0_std)¶ Bases:
tuple
-
Dmax_avg
¶ Alias for field number 5
-
Dmax_std
¶ Alias for field number 6
-
I0_avg
¶ Alias for field number 15
-
I0_std
¶ Alias for field number 16
-
Rg_avg
¶ Alias for field number 13
-
Rg_std
¶ Alias for field number 14
-
alpha_avg
¶ Alias for field number 7
-
alpha_std
¶ Alias for field number 8
-
chi2r_avg
¶ Alias for field number 9
-
chi2r_std
¶ Alias for field number 10
-
density_avg
¶ Alias for field number 1
-
density_std
¶ Alias for field number 2
-
evidence_avg
¶ Alias for field number 3
-
evidence_std
¶ Alias for field number 4
-
radius
¶ Alias for field number 0
-
regularization_avg
¶ Alias for field number 11
-
regularization_std
¶ Alias for field number 12
-
save
(filename, source=None)¶ Save the results of the fit to the file
-
freesas.cormap¶
-
class
LongestRunOfHeads
[source]¶ Bases:
object
Implements the “longest run of heads” by Mark F. Schilling The College Mathematics Journal, Vol. 21, No. 3, (1990), pp. 196-207
See: http://www.maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf
-
A
(n, c)[source]¶ Calculate A(number_of_toss, length_of_longest_run)
Parameters: - n – number of coin toss in the experiment, an integer
- c – length of the longest run of
Returns: The A parameter used in the formula
-
B
(n, c)[source]¶ Calculate B(number_of_toss, length_of_longest_run) to have either a run of Heads either a run of Tails
Parameters: - n – number of coin toss in the experiment, an integer
- c – length of the longest run of
Returns: The B parameter used in the formula
-
probaHeadOrTail
(n, c)[source]¶ Calculate the probability of a longest run of head or tails to occur
Parameters: - n – number of coin toss in the experiment, an integer
- c – length of the longest run of heads or tails, an integer
Returns: The probablility of having c subsequent heads or tails in a n toss of fair coin
-
probaLongerRun
(n, c)[source]¶ Calculate the probability for the longest run of heads or tails to exceed the observed length
Parameters: - n – number of coin toss in the experiment, an integer
- c – length of thee observed run of heads or tails, an integer
Returns: The probablility of having more than c subsequent heads or tails in a n toss of fair coin
-
-
gof
(data1, data2)[source]¶ Calculate the probability for a couple of dataset to be equivalent
Implementation according to: http://www.nature.com/nmeth/journal/v12/n5/full/nmeth.3358.html
Parameters: - data1 – numpy array
- data2 – numpy array
Returns: probablility for the 2 data to be equivalent
freesas.invariants¶
This module is mainly about the calculation of the Rambo-Tainer invariant described in:
https://dx.doi.org/10.1038%2Fnature12070
Some formula taken from Putnam et al, 2007, Table 1 in the review
-
extrapolate
(data, guinier)[source]¶ Extrapolate SAS data according to the Guinier fit until q=0 Uncertainties are extrapolated (linearly) from the Guinier region
Parameters: - data – SAS data in q,I,dI format
- guinier – result of a Guinier fit
Returns: extrapolated SAS data
-
calc_Porod
(data, guinier)[source]¶ Calculate the particle volume according to Porod’s formula:
V = 2*π²I₀²/(sum_q I(q)q² dq)
Formula from Putnam’s review, 2007, table 1 Intensities are extrapolated to q=0 using Guinier fit.
Parameters: - data – SAS data in q,I,dI format
- Guinier – result of a Guinier fit (instance of RT_RESULT)
Returns: Volume calculated according to Porrod’s formula
-
calc_Vc
(data, Rg, dRg, I0, dI0, imin)[source]¶ Calculates the Rambo-Tainer invariant Vc, including extrapolation to q=0
Parameters: - data – SAS data in q,I,dI format, cropped to maximal q that should be used for calculation (normally 2 nm-1)
- Rg,dRg,I0,dI0 – results from Guinier approximation/autorg
- imin – minimal index of the Guinier range, below that index data will be extrapolated by the Guinier approximation
Returns: Vc and an error estimate based on non-correlated error propagation
-
calc_Rambo_Tainer
(data, guinier, qmax=2.0)[source]¶ calculates the invariants Vc and Qr from the Rambo & Tainer 2013 Paper, also the the mass estimate based on Qr for proteins
Parameters: - data – data in q,I,dI format, q in nm^-1
- guinier – RG_RESULT instance with result from the Guinier fit
- qmax – maximum q-value for the calculation in nm^-1
@return: dict with Vc, Qr and mass plus errors
freesas.model¶
-
delta_expand
(vec1, vec2)[source]¶ Create a 2d array with the difference vec1[i]-vec2[j]
Parameters: vec2 (vec1,) – 1d-array Return v1 - v2: difference for any element of v1 and v2 (i.e a 2D array)
-
class
SASModel
(molecule=None)[source]¶ Bases:
object
Tools for Dummy Atoms Model manipulation
-
read
(filename)[source]¶ Read the PDB file, extract coordinates of each dummy atom, extract the R-factor of the model, coordinates of each dummy atom and pdb file header.
Parameters: filename – name of the pdb file to read
-
save
(filename)[source]¶ Save the position of each dummy atom in a PDB file.
Parameters: filename – name of the pdb file to write
-
centroid
()[source]¶ Calculate the position of the center of mass of the molecule.
Return self.com: 1d array, coordinates of the center of mass of the molecule
-
inertiatensor
()[source]¶ calculate the inertia tensor of the protein
Return self.inertensor: inertia tensor of the molecule
-
canonical_translate
()[source]¶ Calculate the translation matrix to translate the center of mass of the molecule on the origin of the base.
Return trans: translation matrix
-
canonical_rotate
()[source]¶ Calculate the rotation matrix to align inertia momentum of the molecule on principal axis.
Return rot: rotation matrix det==1
-
canonical_parameters
()[source]¶ Save the 6 canonical parameters of the initial molecule: x0, y0, z0, the position of the center of mass phi, theta, psi, the three Euler angles of the canonical rotation (axis:x,y’,z’‘)
-
calc_invariants
(use_cython=True)[source]¶ - Calculate the invariants of the structure:
- fineness, ie. average distance between an atoms and its nearest neighbor
- radius of gyration of the model
- diameter of the model
Return invariants: 3-tuple containing (fineness, Rg, Dmax)
-
fineness
¶
-
Rg
¶
-
Dmax
¶
-
dist
(other, molecule1, molecule2, use_cython=True)[source]¶ Calculate the distance with another model.
Parameters: - self,other – two SASModel
- molecule1 – 2d array of the position of each atom of the first molecule
- molecule2 – 2d array of the position of each atom of the second molecule
Return D: NSD between the 2 molecules, in their position molecule1 and molecule2
-
transform
(param, symmetry, reverse=None)[source]¶ Calculate the new coordinates of each dummy atoms of the molecule after a transformation defined by six parameters and a symmetry
Parameters: - param – 6 parameters of transformation (3 coordinates of translation, 3 Euler angles)
- symmetry – list of three constants which define a symmetry to apply
Return mol: 2d array, coordinates after transformation
-
dist_after_movement
(param, other, symmetry)[source]¶ The first molecule, molref, is put on its canonical position. The second one, mol2, is moved following the transformation selected
Parameters: - param – list of 6 parameters for the transformation, 3 coordinates of translation and 3 Euler angles
- symmetry – list of three constants which define a symmetry to apply
Return distance: the NSD between the first molecule and the second one after its movement
-
freesas.plot¶
Functions to generating graphs related to SAS.
-
scatter_plot
(data, guinier=None, ift=None, filename=None, img_format='svg', unit='nm', title='Scattering curve', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a scattering plot I = f(q) in semi_log_y.
Parameters: - data – data read from an ASCII file, 3 column (q,I,err)
- filename – name of the file where the cuve should be saved
- img_format – image format
- unit – Unit name for Rg and 1/q
- guinier – output of autoRg
- ift – converged instance of BIFT (output of auto_bift)
- ax – subplot where the plot shall go in
Returns: the matplotlib figure
-
kratky_plot
(data, guinier, filename=None, img_format='svg', unit='nm', title='Dimensionless Kratky plot', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a Kratky plot q²Rg²I/I₀ = f(q·Rg)
Parameters: - data – data read from an ASCII file, 3 column (q,I,err)
- guinier – output of autoRg
- filename – name of the file where the cuve should be saved
- img_format – image format
- unit – Unit name for Rg and 1/q
- ax – subplot where the plot shall go in
Returns: the matplotlib figure
-
guinier_plot
(data, guinier, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a guinier plot: ln(I) = f(q²)
Parameters: - data – data read from an ASCII file, 3 column (q,I,err)
- guinier – A RG_RESULT object from AutoRg
- filename – name of the file where the cuve should be saved
- img_format – image format
Param: ax: subplot where to plot in
Returns: the matplotlib figure
-
density_plot
(ift, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a density plot p(r)
@param ift: An IFT result comming out of BIFT @param filename: name of the file where the cuve should be saved @param img_format: image image format @param ax: subplotib where to plot in @return: the matplotlib figure
freesas.sasio¶
Contains helper functions for loading SAS data from differents sources.
-
load_scattering_data
(filename: Union[os.PathLike, str, bytes]) → numpy.ndarray[source]¶ Load scattering data q, I, err into a numpy array.
Parameters: filename – ASCII file, 3 column (q,I,err) Returns: numpy array with 3 column (q,I,err)
-
parse_ascii_data
(input_file_text: List[str], number_of_columns: int) → numpy.ndarray[source]¶ Parse data from an ascii file into an N column numpy array
Parameters: - input_file_text – List containing one line of input data per element
- number_of_columns – Expected number of lines in the data file
Returns: numpy array with 3 column (q,I,err)