Cell
- class simble.cell.Cell(heavy_chain, light_chain, created_at, is_alive=True, location=LocationName.GC, cell_type=CellType.DEFAULT)
Bases:
objectRepresents a cell in the simulation.
- heavy_chain
The heavy chain of the cell.
- Type:
- light_chain
The light chain of the cell.
- Type:
- created_at
The generation at which the cell was created.
- Type:
int
- is_alive
Whether the cell is alive.
- Type:
bool
- location
The location of the cell.
- Type:
- as_AIRR(generation)
Returns the cell data in AIRR format.
- as_fasta(generation)
Returns both chains with cell data in FASTA format.
- as_fasta_helper(generation, heavy)
Returns the cell and chain data in FASTA format for a single chain.
- calculate_affinity(target_pair)
Calculates the affinity of the cell’s chains to a target pair.
- differentiate(cell_type)
Changes the cell type to a different type.
- kill_cell()
Marks the cell as dead.
- mutate_cell()
Mutates the cell’s heavy and light chains.
- remake_self()
Creates a new Cell instance with the same properties.
- class simble.cell.CellType(*values)
Bases:
EnumEnum representing different cell types in the simulation.
- DEFAULT = 'gc_b_cell'
- MBC = 'memory_b_cell'
- PC = 'plasma_cell'
- class simble.cell.SingleChainCell(heavy_chain, light_chain, created_at, is_alive=True, location=LocationName.GC, cell_type=CellType.DEFAULT)
Bases:
CellRepresents a cell with only one chain (heavy).
- as_AIRR(generation)
Returns the cell data in AIRR format.
- as_fasta(generation)
Returns both chains with cell data in FASTA format.
- calculate_affinity(target_pair)
Calculates the affinity of the cell’s chain to a target pair.
- mutate_cell()
Mutates the cell’s heavy chain.
Chain
- class simble.chain.Chain(nucleotide_seq, amino_acid_seq=None, nucleotide_gaps=None, mutability_map=None, gapped_seq=None, cdr3_aa_length=13, junction=None)
Bases:
objectRepresents a chain of the BCR.
- nucleotide_seq
The nucleotide sequence of the chain.
- Type:
str
- amino_acid_seq
The amino acid sequence of the chain.
- Type:
str
- nucleotide_gaps
A dictionary mapping gap positions to their lengths.
- Type:
dict
- mutability_map
A list of mutability weights for each nucleotide position.
- Type:
list
- CDR3_length
The length of the CDR3 region in amino acids.
- Type:
int
- junction
The junction sequence of the chain.
- Type:
str
- cdr_similarity
Similarity of the CDR regions to the target.
- Type:
float
- fwr_similarity
Similarity of the FWR regions to the target.
- Type:
float
- similarity
Overall similarity of the chain to the target.
- Type:
float
- abstract property IS_HEAVY
Returns whether the chain is a heavy chain
- as_AIRR(generation)
Generates a dictionary representation of the chain in AIRR format
- calculate_affinity(target_pair)
Calculates the affinity of the chain to a target pair
- copy()
Creates a deep copy of the Chain object
- create_mutability_map()
Creates a mutability map based on the nucleotide sequence
- get_functionality()
Checks if the chain is functional based on its amino acid sequence
- get_gapped_sequence()
Returns the nucleotide sequence with gaps represented by ‘.’
- get_observed_mutations(germline_gapped, targets)
Calculates the observed mutations in the chain compared to a germline sequence.
- Parameters:
germline_gapped (str) – The gapped germline sequence to compare against.
targets (list) – A list of target positions to exclude from the mutation count.
- Returns:
- A tuple containing the observed mutations, filtered mutations,
CDR mutations, and FWR mutations.
- Return type:
tuple
- abstractmethod get_target_from_pair(target_pair)
Returns the appropriate target from a TargetAminoPair based on the chain type
- property is_functional
Checks if the chain is functional based on its amino acid sequence.
- property junction
Returns the junction nucleotide sequence of the chain.
- property junction_aa
Returns the junction amino acid sequence of the chain.
- mutate(cell_mutation_rate, n=None)
Mutates the chain based on a mutation rate and returns the number of mutations.
- Parameters:
cell_mutation_rate (float) – The mutation rate for the cell.
n (int, optional) – The number of mutations to perform. If None, it will be sampled from a Poisson distribution.
- Returns:
The number of mutations performed.
- Return type:
int
- property mutate_probability
Returns the mutation probability for the chain
- abstract property shm_per_site
Returns the mutation rate per site per generation for the chain
- update_mutability_map(mutated_positions)
Updates the mutability map based on mutated positions
- class simble.chain.EmptyChain
Bases:
ChainRepresents an empty chain, used when no chain is available.
- property IS_HEAVY
Returns whether the chain is a heavy chain
- as_AIRR(generation)
Returns an empty dictionary for AIRR format since this chain has no sequence.
- calculate_affinity(target_pair)
Returns 1 as the affinity for an empty chain, indicating no effect on binding.
- copy()
Creates a new EmptyChain object
- get_observed_mutations(germline_gapped, targets)
Returns no observed mutations, filtered mutations, CDR mutations, and FWR mutations since the sequence does not exist.
- get_target_from_pair(target_pair)
Returns the appropriate target from a TargetAminoPair based on the chain type
- mutate(cell_mutation_rate, n=None)
Does nothing for EmptyChain, as it has no sequence to mutate.
- property shm_per_site
Returns the mutation rate per site per generation for the chain
- class simble.chain.HeavyChain(*args, **kwargs)
Bases:
ChainRepresents an IGH (heavy) chain of the BCR.
- shm_per_site
Mutation rate per site per generation for the heavy chain
- Type:
float
- IS_HEAVY
Indicates that this is a heavy chain
- Type:
bool
- property IS_HEAVY
Returns whether the chain is a heavy chain
- get_target_from_pair(target_pair)
Returns the appropriate target from a TargetAminoPair based on the chain type
- property shm_per_site
Returns the mutation rate per site per generation for the chain
- class simble.chain.LightChain(*args, **kwargs)
Bases:
ChainRepresents an IGL or IGK (light) chain of the BCR.
- shm_per_site
Mutation rate per site per generation for the light chain
- Type:
float
- IS_HEAVY
Indicates that this is NOT a heavy chain
- Type:
bool
- property IS_HEAVY
Returns whether the chain is a heavy chain
- get_target_from_pair(target_pair)
Returns the appropriate target from a TargetAminoPair based on the chain type
- property shm_per_site
Returns the mutation rate per site per generation for the chain
Location
- class simble.location.Location(name, settings)
Bases:
objectRepresents a location in the simulation.
- name
The name of the location.
- Type:
str
- settings
The settings for the location.
- Type:
- current_generation
The current population in the location.
- Type:
list
- immigrating_population
The population that is immigrating to the location.
- Type:
list
- number_of_children
The number of children produced by the population.
- Type:
list
- finish_migration()
Finalizes the migration of cells to this location.
- update_cell(node)
Updates the cell’s location and mutation rate.
- class simble.location.LocationName(*values)
Bases:
EnumEnum representing different locations in the simulation.
- GC = 'germinal_center'
- OTHER = 'other'
- encode()
Encodes the enum as a dictionary for serialization.
- simble.location.as_enum(d)
Converts a dictionary (e.g. from json) to an enum if it contains an encoded enum.
Simble
- class simble.simble.TqdmLoggingHandler(level=0)
Bases:
HandlerCustom logging handler to write logs to tqdm output.
- emit(record)
Do whatever it takes to actually log the specified logging record.
This version is intended to be implemented by subclasses and so raises a NotImplementedError.
- simble.simble.do_simulation(i, seed, filename)
Runs a single simulation with the given seed and settings.
- simble.simble.main()
Main function to run the simulation.
- simble.simble.process_results(results)
Processes the results of the simulations and saves them to files.
- simble.simble.set_logger()
Sets up the logger for the simulation.
Simulation
- simble.simulation.do_differentiation(location, time)
Handles the differentiation of cells as they leave a location. Currently, this is only implemented for the germinal center (GC) location.
- Parameters:
location (Location) – The germinal center location.
time (int) – The current time in the simulation.
- Returns:
A list of nodes that are migrating out of the germinal center.
- Return type:
list
- simble.simulation.get_population_data(location, time)
Calculates population data for a given location at a specific time.
- Parameters:
location (Location) – The location for which to calculate population data.
time (int) – The current time in the simulation.
- Returns:
A dictionary containing population data, including the number of cells with children.
- Return type:
dict
- simble.simulation.non_gc_population_control(current_generation)
Handles population control for non-GC locations.
- Parameters:
current_generation (list) – The current population in the non-GC location.
- Returns:
A new generation of nodes, where each node is a child of the original nodes.
- Return type:
list
- simble.simulation.run_simulation(i, result_dir)
Runs the simulation for a single iteration.
- Parameters:
i (int) – The iteration number of the simulation.
result_dir (str) – The directory where results will be saved.
- Returns:
- A dictionary containing the results of the simulation,
including AIRR data, FASTA sequences, trees, and population data.
- Return type:
dict
- simble.simulation.simulate(clone_id, TARGET_PAIR, gc_start_generation, root, time=0)
Runs the simulation for a single clone.
- Parameters:
clone_id (int) – The ID of the clone.
TARGET_PAIR (TargetAminoPair) – The target amino acid pair for the simulation.
gc_start_generation (list) – The initial population in the germinal center.
root (Node) – The root node of the simulation tree.
time (int) – The current time in the simulation.
- Returns:
A tuple containing the sampled nodes, population data, and development data.
- Return type:
tuple
Target
- class simble.target.TargetAminoAcid(gapped_nucleotide_seq, cdr3_length)
Bases:
objectRepresents a target amino acid sequence.
- gapped_nucleotide_seq
The gapped nucleotide sequence of the target.
- Type:
str
- CDR_POSITIONS
The positions of the CDR regions in the amino acid sequence.
- Type:
list
- amino_acid_seq
The amino acid sequence derived from the gapped nucleotide sequence.
- Type:
str
- mutation_locations
The positions of mutations from germline in the target amino acid sequence.
- Type:
list
- all_multipliers
A dictionary of multipliers for each position in the amino acid sequence.
- Type:
dict
- cdr_multipliers
A dictionary of multipliers specifically for CDR positions.
- Type:
dict
- fwr_multipliers
A dictionary of multipliers specifically for FWR positions.
- Type:
dict
- choose_replacement_nucleotide(codon, curr_amino_acid)
Chooses a replacement nucleotide for a codon that results in a different amino acid.
- Parameters:
codon (str) – The codon to mutate.
curr_amino_acid (str) – The current amino acid represented by the codon.
- Returns:
A tuple containing the new codon and the new amino acid.
- Return type:
tuple
- property max_affinity
Calculates the maximum affinity of the target amino acid sequence.
- mutate(n)
Mutates the target amino acid sequence by replacing nucleotides.
- Parameters:
n (int) – The number of mutations to apply.
- class simble.target.TargetAminoPair(heavy_gapped_nucleotide, light_gapped_nucleotide, heavy_cdr3_length, light_cdr3_length)
Bases:
objectRepresents a pair of target amino acids for heavy and light chains.
- heavy
The target amino acid for the heavy chain.
- Type:
- light
The target amino acid for the light chain.
- Type:
- property max_affinity
Calculates the maximum affinity of the target pair.
- mutate(heavy_n, light_n)
Creates target mutations in the target amino acid chains.
- Parameters:
heavy_n (int) – The number of mutations to apply to the heavy chain.
light_n (int) – The number of mutations to apply to the light chain.
Tree
- class simble.tree.Node(cell, parent=None, heavy_mutations=0, light_mutations=0, generation=0, clone_id=None)
Bases:
objectRepresents a node in the simulation tree.
- heavy_mutations
The number of heavy chain mutations.
- Type:
int
- light_mutations
The number of light chain mutations.
- Type:
int
- generation
The generation of the node.
- Type:
int
- clone_id
The unique identifier for the clone.
- Type:
int
- children
The list of child nodes.
- Type:
list
- antigen
The antigen bound to the cell at this time point.
- Type:
int
- sampled_time
The time at which the node was sampled.
- Type:
int
- last_migration
The last migration time of the node’s ancestors.
- Type:
int
- copy()
Creates a copy of the node.
- Returns:
A new Node instance with the same properties as this node.
- Return type:
- property occupancy
Calculates the occupancy in this node’s current location based on its generation and last migration.
- property occupancy_other
Calculates the occupancy of the node in the ‘other’ location.
- prune_subtree(to_keep)
Prunes the subtree to keep only nodes with IDs in the to_keep set.
- Parameters:
to_keep (set) – A set of IDs to keep in the subtree.
- Returns:
A new Node instance representing the pruned subtree.
- Return type:
- prune_up_tree()
Prunes the tree upwards, removing this node and its ancestors if they have no children.
- property time_since_last_split
Calculates the time since the last split in the tree.
- write_newick(time_tree=False)
Writes the node and its children in Newick format.
- Parameters:
time_tree (bool) – Whether to write the tree with time information.
- Returns:
The Newick representation of the node and its children.
- Return type:
str
- write_newick_node(time_tree=False, subtrees=None)
Writes the node in Newick format.
- Parameters:
time_tree (bool) – Whether to write the tree with time information.
subtrees (list) – A list of Newick strings for the children.
- Returns:
- The Newick representation of the node and,
if subtrees’ Newick strings are provided, its children.
- Return type:
str
Settings
- class simble.settings.Encodable
Bases:
objectBase class for objects that can be encoded to a dictionary.
- encode()
Encodes the object as a dictionary for serialization.
- class simble.settings.LocationSettings(name, sample_times=None, mutation_rate=None, max_population=1000, migration_rate=0, sample_size=None)
Bases:
EncodableSettings for a specific location in the simulation.
- name
The name of the location.
- Type:
- sample_times
Times at which samples are taken.
- Type:
list
- mutation_rate
The mutation rate for the location.
- Type:
float
- max_population
The maximum population allowed in the location.
- Type:
int
- migration_rate
The rate of migration out of this location.
- Type:
float
- sample_size
The number of cells to sample from the location.
- Type:
int
- class simble.settings.Settings
Bases:
EncodableGlobal settings for the simulation.
- LOCATIONS
List of LocationSettings for different locations.
- Type:
list
- HEAVY_SHM_PER_SITE
SHM rate per site of the heavy chain.
- Type:
float
- LIGHT_SHM_PER_SITE
SHM rate per site of the light chain.
- Type:
float
- TARGET_MUTATIONS_HEAVY
Number of target mutations for heavy chain.
- Type:
int
- TARGET_MUTATIONS_LIGHT
Number of target mutations for light chain.
- Type:
int
- SELECTION
Whether selection is applied in the simulation.
- Type:
bool
- UNIFORM
Whether to use a uniform mutation model.
- Type:
bool
- RESULTS_DIR
Directory for saving results.
- Type:
str
- MULTIPLIER
Multiplier for affinity calculations.
- Type:
float
- _x_RNG
Random number generator instance.
- Type:
random.Random
- DEV
Development mode flag.
- Type:
bool
- FASTA
Whether to output results in FASTA format.
- Type:
bool
- VERBOSE
Verbosity level for logging.
- Type:
bool
- CDR_DIST
Distribution type for CDR mutations.
- Type:
str
- CDR_VAR
Variance for CDR mutations.
- Type:
float
- FWR_DIST
Distribution type for FWR mutations.
- Type:
str
- FWR_VAR
Variance for FWR mutations.
- Type:
float
- TIME_SWITCH
Time at which to switch locations.
- Type:
int
- GENERATIONS_PER_DAY
Number of generations per day.
- Type:
float
- MEMORY_SAVE
Whether to save memory during the simulation.
- Type:
bool
- KEEP_FULL_TREE
Whether to keep the full tree of cells.
- Type:
bool
- QUIET
Whether to suppress output.
- Type:
bool
- property END_TIME
Calculates the end time of the simulation based on sample times.
- property RNG
Returns the random number generator instance.
- property SELECTION
Returns whether selection is applied in the simulation.
- property UNIFORM
Returns whether a uniform mutation model is used.
- update_from_dict(dictionary)
Updates the settings from a dictionary.
Helper
- class simble.helper.StartChain(nucleotide_seq, gapped_seq, cdr3_aa_length, junction)
Bases:
tuple- cdr3_aa_length
Alias for field number 2
- gapped_seq
Alias for field number 1
- junction
Alias for field number 3
- nucleotide_seq
Alias for field number 0
- class simble.helper.StartConstants(chain, constants)
Bases:
tuple- chain
Alias for field number 0
- constants
Alias for field number 1
- simble.helper.codon_to_amino_acid(codon)
Converts a codon (3-nucleotide sequence) to its corresponding amino acid.
- Parameters:
codon (str) – A 3-nucleotide sequence representing a codon.
- Returns:
The corresponding amino acid represented by the codon.
- Return type:
str
- simble.helper.get_data(path)
Returns the absolute path to a data file in the simble package.
- Parameters:
path (str) – The relative path to the data file.
- Returns:
The absolute path to the data file.
- Return type:
str
- simble.helper.get_mutability_of_kmer(kmer, heavy=True)
Gets the mutability of a given kmer.
- Parameters:
kmer (str) – The 5-mer sequence for which to get mutability.
heavy (bool) – Whether to use the heavy chain mutability table.
- Returns:
The mutability value for the kmer.
- Return type:
float
- simble.helper.get_random_start_pair()
Generates a random start pair of heavy and light chains.
- Returns:
A named tuple containing the heavy and light chains.
- Return type:
StartPair
- simble.helper.get_substitution_probability(kmer, heavy=True)
Gets the substitution probabilities for a given kmer.
- Parameters:
kmer (str) – The 5-mer sequence for which to get substitution probabilities.
heavy (bool) – Whether to use the heavy chain substitution table.
- Returns:
A list of probabilities for each nucleotide substitution (A, C, G, T).
- Return type:
list
- simble.helper.make_all_plots(df, result_dir, simulation=False)
Creates plots for all columns in the DataFrame and saves them to files.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the data to plot.
result_dir (str) – The directory to save the plots.
simulation (bool) – Whether the plots are for a simulation (affects title).
- simble.helper.make_bar_plot(data, results_file, xlabel, title)
Creates a bar plot of the given data and saves it to a file.
- Parameters:
data (np.ndarray) – The data to plot.
results_file (str) – The file path to save the plot.
xlabel (str) – The label for the x-axis.
title (str) – The title of the plot.
- simble.helper.make_plot(data, times, results_file, ylabel, title, log=False)
Creates a plot of the given data and saves it to a file.
- Parameters:
data (np.ndarray) – The data to plot.
times (np.ndarray) – The time points corresponding to the data.
results_file (str) – The file path to save the plot.
ylabel (str) – The label for the y-axis.
title (str) – The title of the plot.
log (bool) – Whether to use a logarithmic scale for the y-axis.
- simble.helper.read_sf5_table(filename)
Reads a CSV file containing the SF5 mutability table.
- Parameters:
filename (str) – The path to the CSV file.
- Returns:
A DataFrame containing the mutability data.
- Return type:
pd.DataFrame
- simble.helper.remove_gaps(aligned)
Removes gaps from an aligned sequence.
- Parameters:
aligned (str) – The aligned sequence with gaps.
- Returns:
The aligned sequence with gaps removed.
- Return type:
str
- simble.helper.snake_case_to_normal(name)
Converts a snake_case string to a normal string with spaces.
- Parameters:
name (str) – The snake_case string to convert.
- Returns:
The converted string with spaces instead of underscores.
- Return type:
str
- simble.helper.translate_to_amino_acid(nucleotide_seq)
Translates a nucleotide sequence into an amino acid sequence.
- Parameters:
nucleotide_seq (str) – The nucleotide sequence to translate.
- Returns:
The translated amino acid sequence.
- Return type:
str
Parsing
- simble.parsing.get_parser()
Creates and returns an argument parser for the simble program.
- Returns:
The argument parser for the simble program.
- Return type:
argparse.ArgumentParser
- simble.parsing.read_from_json(filename)
Reads a JSON file and returns its contents as a dictionary.
- Parameters:
filename (str) – The path to the JSON file.
- Returns:
The contents of the JSON file as a dictionary.
- Return type:
dict
- simble.parsing.validate_and_process_args(args)
Validates and processes command line arguments and updates the simulation settings.
- Parameters:
args (argparse.Namespace) – The parsed command line arguments.
- Returns:
A list of warnings, if any.
- Return type:
list
- simble.parsing.validate_json(json_input)
Validates the JSON input against the global settings object.
- Parameters:
json_input (dict) – The JSON input to validate.
- Raises:
ValueError – If the JSON input contains invalid fields or types.
- simble.parsing.validate_location(location)
Validates a location dictionary.
- Parameters:
location (dict) – A dictionary representing a location.
- Raises:
ValueError – If the location dictionary contains invalid fields or types.
- simble.parsing.validate_samples(sample_info)
Validates the sampling settings.
- Parameters:
sample_info (list) – A list containing start, stop, and step values.
- Raises:
ValueError – If the sampling setting is invalid.