Cell

class simble.cell.Cell(heavy_chain, light_chain, created_at, is_alive=True, location=LocationName.GC, cell_type=CellType.DEFAULT)

Bases: object

Represents a cell in the simulation.

heavy_chain

The heavy chain of the cell.

Type:

HeavyChain

light_chain

The light chain of the cell.

Type:

LightChain

created_at

The generation at which the cell was created.

Type:

int

is_alive

Whether the cell is alive.

Type:

bool

location

The location of the cell.

Type:

LocationName

cell_type

The type of the cell.

Type:

CellType

as_AIRR(generation)

Returns the cell data in AIRR format.

as_fasta(generation)

Returns both chains with cell data in FASTA format.

as_fasta_helper(generation, heavy)

Returns the cell and chain data in FASTA format for a single chain.

calculate_affinity(target_pair)

Calculates the affinity of the cell’s chains to a target pair.

differentiate(cell_type)

Changes the cell type to a different type.

kill_cell()

Marks the cell as dead.

mutate_cell()

Mutates the cell’s heavy and light chains.

remake_self()

Creates a new Cell instance with the same properties.

class simble.cell.CellType(*values)

Bases: Enum

Enum representing different cell types in the simulation.

DEFAULT = 'gc_b_cell'
MBC = 'memory_b_cell'
PC = 'plasma_cell'
class simble.cell.SingleChainCell(heavy_chain, light_chain, created_at, is_alive=True, location=LocationName.GC, cell_type=CellType.DEFAULT)

Bases: Cell

Represents a cell with only one chain (heavy).

as_AIRR(generation)

Returns the cell data in AIRR format.

as_fasta(generation)

Returns both chains with cell data in FASTA format.

calculate_affinity(target_pair)

Calculates the affinity of the cell’s chain to a target pair.

mutate_cell()

Mutates the cell’s heavy chain.

Chain

class simble.chain.Chain(nucleotide_seq, amino_acid_seq=None, nucleotide_gaps=None, mutability_map=None, gapped_seq=None, cdr3_aa_length=13, junction=None)

Bases: object

Represents a chain of the BCR.

nucleotide_seq

The nucleotide sequence of the chain.

Type:

str

amino_acid_seq

The amino acid sequence of the chain.

Type:

str

nucleotide_gaps

A dictionary mapping gap positions to their lengths.

Type:

dict

mutability_map

A list of mutability weights for each nucleotide position.

Type:

list

CDR3_length

The length of the CDR3 region in amino acids.

Type:

int

junction

The junction sequence of the chain.

Type:

str

cdr_similarity

Similarity of the CDR regions to the target.

Type:

float

fwr_similarity

Similarity of the FWR regions to the target.

Type:

float

similarity

Overall similarity of the chain to the target.

Type:

float

abstract property IS_HEAVY

Returns whether the chain is a heavy chain

as_AIRR(generation)

Generates a dictionary representation of the chain in AIRR format

calculate_affinity(target_pair)

Calculates the affinity of the chain to a target pair

copy()

Creates a deep copy of the Chain object

create_mutability_map()

Creates a mutability map based on the nucleotide sequence

get_functionality()

Checks if the chain is functional based on its amino acid sequence

get_gapped_sequence()

Returns the nucleotide sequence with gaps represented by ‘.’

get_observed_mutations(germline_gapped, targets)

Calculates the observed mutations in the chain compared to a germline sequence.

Parameters:
  • germline_gapped (str) – The gapped germline sequence to compare against.

  • targets (list) – A list of target positions to exclude from the mutation count.

Returns:

A tuple containing the observed mutations, filtered mutations,

CDR mutations, and FWR mutations.

Return type:

tuple

abstractmethod get_target_from_pair(target_pair)

Returns the appropriate target from a TargetAminoPair based on the chain type

property is_functional

Checks if the chain is functional based on its amino acid sequence.

property junction

Returns the junction nucleotide sequence of the chain.

property junction_aa

Returns the junction amino acid sequence of the chain.

mutate(cell_mutation_rate, n=None)

Mutates the chain based on a mutation rate and returns the number of mutations.

Parameters:
  • cell_mutation_rate (float) – The mutation rate for the cell.

  • n (int, optional) – The number of mutations to perform. If None, it will be sampled from a Poisson distribution.

Returns:

The number of mutations performed.

Return type:

int

property mutate_probability

Returns the mutation probability for the chain

abstract property shm_per_site

Returns the mutation rate per site per generation for the chain

update_mutability_map(mutated_positions)

Updates the mutability map based on mutated positions

class simble.chain.EmptyChain

Bases: Chain

Represents an empty chain, used when no chain is available.

property IS_HEAVY

Returns whether the chain is a heavy chain

as_AIRR(generation)

Returns an empty dictionary for AIRR format since this chain has no sequence.

calculate_affinity(target_pair)

Returns 1 as the affinity for an empty chain, indicating no effect on binding.

copy()

Creates a new EmptyChain object

get_observed_mutations(germline_gapped, targets)

Returns no observed mutations, filtered mutations, CDR mutations, and FWR mutations since the sequence does not exist.

get_target_from_pair(target_pair)

Returns the appropriate target from a TargetAminoPair based on the chain type

mutate(cell_mutation_rate, n=None)

Does nothing for EmptyChain, as it has no sequence to mutate.

property shm_per_site

Returns the mutation rate per site per generation for the chain

class simble.chain.HeavyChain(*args, **kwargs)

Bases: Chain

Represents an IGH (heavy) chain of the BCR.

shm_per_site

Mutation rate per site per generation for the heavy chain

Type:

float

IS_HEAVY

Indicates that this is a heavy chain

Type:

bool

property IS_HEAVY

Returns whether the chain is a heavy chain

get_target_from_pair(target_pair)

Returns the appropriate target from a TargetAminoPair based on the chain type

property shm_per_site

Returns the mutation rate per site per generation for the chain

class simble.chain.LightChain(*args, **kwargs)

Bases: Chain

Represents an IGL or IGK (light) chain of the BCR.

shm_per_site

Mutation rate per site per generation for the light chain

Type:

float

IS_HEAVY

Indicates that this is NOT a heavy chain

Type:

bool

property IS_HEAVY

Returns whether the chain is a heavy chain

get_target_from_pair(target_pair)

Returns the appropriate target from a TargetAminoPair based on the chain type

property shm_per_site

Returns the mutation rate per site per generation for the chain

Location

class simble.location.Location(name, settings)

Bases: object

Represents a location in the simulation.

name

The name of the location.

Type:

str

settings

The settings for the location.

Type:

LocationSettings

current_generation

The current population in the location.

Type:

list

immigrating_population

The population that is immigrating to the location.

Type:

list

number_of_children

The number of children produced by the population.

Type:

list

finish_migration()

Finalizes the migration of cells to this location.

update_cell(node)

Updates the cell’s location and mutation rate.

class simble.location.LocationName(*values)

Bases: Enum

Enum representing different locations in the simulation.

GC = 'germinal_center'
OTHER = 'other'
encode()

Encodes the enum as a dictionary for serialization.

simble.location.as_enum(d)

Converts a dictionary (e.g. from json) to an enum if it contains an encoded enum.

Simble

class simble.simble.TqdmLoggingHandler(level=0)

Bases: Handler

Custom logging handler to write logs to tqdm output.

emit(record)

Do whatever it takes to actually log the specified logging record.

This version is intended to be implemented by subclasses and so raises a NotImplementedError.

simble.simble.do_simulation(i, seed, filename)

Runs a single simulation with the given seed and settings.

simble.simble.main()

Main function to run the simulation.

simble.simble.process_results(results)

Processes the results of the simulations and saves them to files.

simble.simble.set_logger()

Sets up the logger for the simulation.

Simulation

simble.simulation.do_differentiation(location, time)

Handles the differentiation of cells as they leave a location. Currently, this is only implemented for the germinal center (GC) location.

Parameters:
  • location (Location) – The germinal center location.

  • time (int) – The current time in the simulation.

Returns:

A list of nodes that are migrating out of the germinal center.

Return type:

list

simble.simulation.get_population_data(location, time)

Calculates population data for a given location at a specific time.

Parameters:
  • location (Location) – The location for which to calculate population data.

  • time (int) – The current time in the simulation.

Returns:

A dictionary containing population data, including the number of cells with children.

Return type:

dict

simble.simulation.non_gc_population_control(current_generation)

Handles population control for non-GC locations.

Parameters:

current_generation (list) – The current population in the non-GC location.

Returns:

A new generation of nodes, where each node is a child of the original nodes.

Return type:

list

simble.simulation.run_simulation(i, result_dir)

Runs the simulation for a single iteration.

Parameters:
  • i (int) – The iteration number of the simulation.

  • result_dir (str) – The directory where results will be saved.

Returns:

A dictionary containing the results of the simulation,

including AIRR data, FASTA sequences, trees, and population data.

Return type:

dict

simble.simulation.simulate(clone_id, TARGET_PAIR, gc_start_generation, root, time=0)

Runs the simulation for a single clone.

Parameters:
  • clone_id (int) – The ID of the clone.

  • TARGET_PAIR (TargetAminoPair) – The target amino acid pair for the simulation.

  • gc_start_generation (list) – The initial population in the germinal center.

  • root (Node) – The root node of the simulation tree.

  • time (int) – The current time in the simulation.

Returns:

A tuple containing the sampled nodes, population data, and development data.

Return type:

tuple

Target

class simble.target.TargetAminoAcid(gapped_nucleotide_seq, cdr3_length)

Bases: object

Represents a target amino acid sequence.

gapped_nucleotide_seq

The gapped nucleotide sequence of the target.

Type:

str

CDR_POSITIONS

The positions of the CDR regions in the amino acid sequence.

Type:

list

amino_acid_seq

The amino acid sequence derived from the gapped nucleotide sequence.

Type:

str

mutation_locations

The positions of mutations from germline in the target amino acid sequence.

Type:

list

all_multipliers

A dictionary of multipliers for each position in the amino acid sequence.

Type:

dict

cdr_multipliers

A dictionary of multipliers specifically for CDR positions.

Type:

dict

fwr_multipliers

A dictionary of multipliers specifically for FWR positions.

Type:

dict

choose_replacement_nucleotide(codon, curr_amino_acid)

Chooses a replacement nucleotide for a codon that results in a different amino acid.

Parameters:
  • codon (str) – The codon to mutate.

  • curr_amino_acid (str) – The current amino acid represented by the codon.

Returns:

A tuple containing the new codon and the new amino acid.

Return type:

tuple

property max_affinity

Calculates the maximum affinity of the target amino acid sequence.

mutate(n)

Mutates the target amino acid sequence by replacing nucleotides.

Parameters:

n (int) – The number of mutations to apply.

class simble.target.TargetAminoPair(heavy_gapped_nucleotide, light_gapped_nucleotide, heavy_cdr3_length, light_cdr3_length)

Bases: object

Represents a pair of target amino acids for heavy and light chains.

heavy

The target amino acid for the heavy chain.

Type:

TargetAminoAcid

light

The target amino acid for the light chain.

Type:

TargetAminoAcid

property max_affinity

Calculates the maximum affinity of the target pair.

mutate(heavy_n, light_n)

Creates target mutations in the target amino acid chains.

Parameters:
  • heavy_n (int) – The number of mutations to apply to the heavy chain.

  • light_n (int) – The number of mutations to apply to the light chain.

Tree

class simble.tree.Node(cell, parent=None, heavy_mutations=0, light_mutations=0, generation=0, clone_id=None)

Bases: object

Represents a node in the simulation tree.

cell

The cell associated with this node.

Type:

Cell

parent

The parent node in the tree.

Type:

Node

heavy_mutations

The number of heavy chain mutations.

Type:

int

light_mutations

The number of light chain mutations.

Type:

int

generation

The generation of the node.

Type:

int

clone_id

The unique identifier for the clone.

Type:

int

children

The list of child nodes.

Type:

list

antigen

The antigen bound to the cell at this time point.

Type:

int

sampled_time

The time at which the node was sampled.

Type:

int

last_migration

The last migration time of the node’s ancestors.

Type:

int

add_child(child)

Adds a child node to this node.

Parameters:

child (Node) – The child node to add.

copy()

Creates a copy of the node.

Returns:

A new Node instance with the same properties as this node.

Return type:

Node

property occupancy

Calculates the occupancy in this node’s current location based on its generation and last migration.

property occupancy_other

Calculates the occupancy of the node in the ‘other’ location.

prune_subtree(to_keep)

Prunes the subtree to keep only nodes with IDs in the to_keep set.

Parameters:

to_keep (set) – A set of IDs to keep in the subtree.

Returns:

A new Node instance representing the pruned subtree.

Return type:

Node

prune_up_tree()

Prunes the tree upwards, removing this node and its ancestors if they have no children.

property time_since_last_split

Calculates the time since the last split in the tree.

write_newick(time_tree=False)

Writes the node and its children in Newick format.

Parameters:

time_tree (bool) – Whether to write the tree with time information.

Returns:

The Newick representation of the node and its children.

Return type:

str

write_newick_node(time_tree=False, subtrees=None)

Writes the node in Newick format.

Parameters:
  • time_tree (bool) – Whether to write the tree with time information.

  • subtrees (list) – A list of Newick strings for the children.

Returns:

The Newick representation of the node and,

if subtrees’ Newick strings are provided, its children.

Return type:

str

simble.tree.simplify_tree(root)

Simplifies the tree by removing nodes with only one child.

Parameters:

root (Node) – The root node of the tree.

Returns:

A new Node instance representing the simplified tree.

Return type:

Node

Settings

class simble.settings.Encodable

Bases: object

Base class for objects that can be encoded to a dictionary.

encode()

Encodes the object as a dictionary for serialization.

class simble.settings.LocationSettings(name, sample_times=None, mutation_rate=None, max_population=1000, migration_rate=0, sample_size=None)

Bases: Encodable

Settings for a specific location in the simulation.

name

The name of the location.

Type:

LocationName

sample_times

Times at which samples are taken.

Type:

list

mutation_rate

The mutation rate for the location.

Type:

float

max_population

The maximum population allowed in the location.

Type:

int

migration_rate

The rate of migration out of this location.

Type:

float

sample_size

The number of cells to sample from the location.

Type:

int

class simble.settings.Settings

Bases: Encodable

Global settings for the simulation.

LOCATIONS

List of LocationSettings for different locations.

Type:

list

HEAVY_SHM_PER_SITE

SHM rate per site of the heavy chain.

Type:

float

LIGHT_SHM_PER_SITE

SHM rate per site of the light chain.

Type:

float

TARGET_MUTATIONS_HEAVY

Number of target mutations for heavy chain.

Type:

int

TARGET_MUTATIONS_LIGHT

Number of target mutations for light chain.

Type:

int

SELECTION

Whether selection is applied in the simulation.

Type:

bool

UNIFORM

Whether to use a uniform mutation model.

Type:

bool

RESULTS_DIR

Directory for saving results.

Type:

str

MULTIPLIER

Multiplier for affinity calculations.

Type:

float

_x_RNG

Random number generator instance.

Type:

random.Random

DEV

Development mode flag.

Type:

bool

FASTA

Whether to output results in FASTA format.

Type:

bool

VERBOSE

Verbosity level for logging.

Type:

bool

CDR_DIST

Distribution type for CDR mutations.

Type:

str

CDR_VAR

Variance for CDR mutations.

Type:

float

FWR_DIST

Distribution type for FWR mutations.

Type:

str

FWR_VAR

Variance for FWR mutations.

Type:

float

TIME_SWITCH

Time at which to switch locations.

Type:

int

GENERATIONS_PER_DAY

Number of generations per day.

Type:

float

MEMORY_SAVE

Whether to save memory during the simulation.

Type:

bool

KEEP_FULL_TREE

Whether to keep the full tree of cells.

Type:

bool

QUIET

Whether to suppress output.

Type:

bool

property END_TIME

Calculates the end time of the simulation based on sample times.

property RNG

Returns the random number generator instance.

property SELECTION

Returns whether selection is applied in the simulation.

property UNIFORM

Returns whether a uniform mutation model is used.

update_from_dict(dictionary)

Updates the settings from a dictionary.

Helper

class simble.helper.StartChain(nucleotide_seq, gapped_seq, cdr3_aa_length, junction)

Bases: tuple

cdr3_aa_length

Alias for field number 2

gapped_seq

Alias for field number 1

junction

Alias for field number 3

nucleotide_seq

Alias for field number 0

class simble.helper.StartConstants(chain, constants)

Bases: tuple

chain

Alias for field number 0

constants

Alias for field number 1

simble.helper.codon_to_amino_acid(codon)

Converts a codon (3-nucleotide sequence) to its corresponding amino acid.

Parameters:

codon (str) – A 3-nucleotide sequence representing a codon.

Returns:

The corresponding amino acid represented by the codon.

Return type:

str

simble.helper.get_data(path)

Returns the absolute path to a data file in the simble package.

Parameters:

path (str) – The relative path to the data file.

Returns:

The absolute path to the data file.

Return type:

str

simble.helper.get_mutability_of_kmer(kmer, heavy=True)

Gets the mutability of a given kmer.

Parameters:
  • kmer (str) – The 5-mer sequence for which to get mutability.

  • heavy (bool) – Whether to use the heavy chain mutability table.

Returns:

The mutability value for the kmer.

Return type:

float

simble.helper.get_random_start_pair()

Generates a random start pair of heavy and light chains.

Returns:

A named tuple containing the heavy and light chains.

Return type:

StartPair

simble.helper.get_substitution_probability(kmer, heavy=True)

Gets the substitution probabilities for a given kmer.

Parameters:
  • kmer (str) – The 5-mer sequence for which to get substitution probabilities.

  • heavy (bool) – Whether to use the heavy chain substitution table.

Returns:

A list of probabilities for each nucleotide substitution (A, C, G, T).

Return type:

list

simble.helper.make_all_plots(df, result_dir, simulation=False)

Creates plots for all columns in the DataFrame and saves them to files.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the data to plot.

  • result_dir (str) – The directory to save the plots.

  • simulation (bool) – Whether the plots are for a simulation (affects title).

simble.helper.make_bar_plot(data, results_file, xlabel, title)

Creates a bar plot of the given data and saves it to a file.

Parameters:
  • data (np.ndarray) – The data to plot.

  • results_file (str) – The file path to save the plot.

  • xlabel (str) – The label for the x-axis.

  • title (str) – The title of the plot.

simble.helper.make_plot(data, times, results_file, ylabel, title, log=False)

Creates a plot of the given data and saves it to a file.

Parameters:
  • data (np.ndarray) – The data to plot.

  • times (np.ndarray) – The time points corresponding to the data.

  • results_file (str) – The file path to save the plot.

  • ylabel (str) – The label for the y-axis.

  • title (str) – The title of the plot.

  • log (bool) – Whether to use a logarithmic scale for the y-axis.

simble.helper.read_sf5_table(filename)

Reads a CSV file containing the SF5 mutability table.

Parameters:

filename (str) – The path to the CSV file.

Returns:

A DataFrame containing the mutability data.

Return type:

pd.DataFrame

simble.helper.remove_gaps(aligned)

Removes gaps from an aligned sequence.

Parameters:

aligned (str) – The aligned sequence with gaps.

Returns:

The aligned sequence with gaps removed.

Return type:

str

simble.helper.snake_case_to_normal(name)

Converts a snake_case string to a normal string with spaces.

Parameters:

name (str) – The snake_case string to convert.

Returns:

The converted string with spaces instead of underscores.

Return type:

str

simble.helper.translate_to_amino_acid(nucleotide_seq)

Translates a nucleotide sequence into an amino acid sequence.

Parameters:

nucleotide_seq (str) – The nucleotide sequence to translate.

Returns:

The translated amino acid sequence.

Return type:

str

Parsing

simble.parsing.get_parser()

Creates and returns an argument parser for the simble program.

Returns:

The argument parser for the simble program.

Return type:

argparse.ArgumentParser

simble.parsing.read_from_json(filename)

Reads a JSON file and returns its contents as a dictionary.

Parameters:

filename (str) – The path to the JSON file.

Returns:

The contents of the JSON file as a dictionary.

Return type:

dict

simble.parsing.validate_and_process_args(args)

Validates and processes command line arguments and updates the simulation settings.

Parameters:

args (argparse.Namespace) – The parsed command line arguments.

Returns:

A list of warnings, if any.

Return type:

list

simble.parsing.validate_json(json_input)

Validates the JSON input against the global settings object.

Parameters:

json_input (dict) – The JSON input to validate.

Raises:

ValueError – If the JSON input contains invalid fields or types.

simble.parsing.validate_location(location)

Validates a location dictionary.

Parameters:

location (dict) – A dictionary representing a location.

Raises:

ValueError – If the location dictionary contains invalid fields or types.

simble.parsing.validate_samples(sample_info)

Validates the sampling settings.

Parameters:

sample_info (list) – A list containing start, stop, and step values.

Raises:

ValueError – If the sampling setting is invalid.