7.4. Tests¶

Humans make mistakes.
Thus almost any analysis code will include mistakes
This includes Unix, R, perl/python/ruby/node, ...
To increase the robustness of our analyses we must become better at detecting mistakes.
Every subset, function, method of every piece of code should be tested on test data (edge cases) to ensure that results are as expected.
functional and unit testing

Unit tests: Unit tests are written from a programmer's perspective. They ensure that a particular method of a class successfully performs a set of specific tasks. Each test confirms that a method produces the expected output when given a known input.
Functional tests: Functional tests are written from a user's perspective. These tests confirm that the system does what users are expecting it to.
Many times the development of a system is likened to the building of a house. While this analogy isn't quite correct, we can extend it for the purposes of understanding the difference between unit and functional tests. Unit testing is analogous to a building inspector visiting a house's construction site. He is focused on the various internal systems of the house, the foundation, framing, electrical, plumbing, and so on. He ensures (tests) that the parts of the house will work correctly and safely, that is, meet the building code. Functional tests in this scenario are analogous to the homeowner visiting this same construction site. He assumes that the internal systems will behave appropriately, that the building inspector is performing his task. The homeowner is focused on what it will be like to live in this house. He is concerned with how the house looks, are the various rooms a comfortable size, does the house fit the family's needs, are the windows in a good spot to catch the morning sun. The homeowner is performing functional tests on the house. He has the user's perspective. The building inspector is performing unit tests on the house. He has the builder's perspective.
Because both types of tests are necessary, you'll need guidelines for writing them.
Writing a suite of maintainable, automated tests without a testing framework is virtually impossible. So choose a testing framework. (https://www.softwaretestingtricks.com/2007/01/unit-testing-versus-functional-tests.html)
7.4.1. Invisible mistakes can be costly¶
Crucially, data analysis problems can be invisible: the analysis runs, the results seem biologically meaningful and are wonderfully interpretable, but they may in fact be completely wrong. Geoffrey Chang's story is an emblematic example. By the mid-2000s he was a young superstar professor crystallographer, having won prestigious awards and published high-profile papers providing 3D-structures of important proteins. For example:
Science (2001) Chang & Roth. Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters.
Journal of Molecular Biology (2003) Chang. Structure of MsbA from Vibrio cholera: a multidrug resistance ABC transporter homolog in a closed conformation.
Science (2005) Reyes & Chang. Structure of the ABC transporter MsbA in complex with ADP vanadate and lipopolysaccharide.
Science (2005) Pornillos et al. X-ray structure of the EmrE multidrug transporter in complex with a substrate. 310:1950-1953.
PNAS (2004) Ma & Chang Structure of the multidrug resistance efflux transporter EmrE from E. coli.
But in 2006, others independently obtained the 3D structure of an ortholog to one of those proteins. Surprisingly, the orthologous structure was essentially a mirror-image of Geoffrey Chang's result. After rigorously double-checking his scripts, Geoffrey Chang realized that:
"an in-house data reduction program introduced a change in sign [..,]".*
In other words, a simple +/- error led to plausible and highly publishable - but dramatically flawed - results:
He retracted all five papers.
This was devastating for Geoffrey Chang, for his career, for the people working with him, for the hundreds of scientists who based follow-up analyses and experiments on the flawed 3D structures, and for the taxpayers or foundations funding the research. A small but costly mistake. (https://software.ac.uk/blog/2016-09-26-how-avoid-having-retract-your-genomics-analysis)
7.4.2. Unit tests¶
Unit testing means testing individual modules of an application in isolation (without any interaction with dependencies) to confirm that the code is doing things right.
The unit tests must be write in same time as the code.
In python there is lot of testing frameworks:
In java Junit is very popular
each language have it's own frame work
7.4.2.1. Python example with unittest¶
7.4.2.1.1. Basic example¶
The code to test
1#########################################################################
2# MacSyFinder - Detection of macromolecular systems in protein dataset #
3# using systems modelling and similarity search. #
4# Authors: Sophie Abby, Bertrand Neron #
5# Copyright (c) 2014-2021 Institut Pasteur (Paris) and CNRS. #
6# See the COPYRIGHT file for details #
7# #
8# This file is part of MacSyFinder package. #
9# #
10# MacSyFinder is free software: you can redistribute it and/or modify #
11# it under the terms of the GNU General Public License as published by #
12# the Free Software Foundation, either version 3 of the License, or #
13# (at your option) any later version. #
14# #
15# MacSyFinder is distributed in the hope that it will be useful, #
16# but WITHOUT ANY WARRANTY; without even the implied warranty of #
17# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
18# GNU General Public License for more details . #
19# #
20# You should have received a copy of the GNU General Public License #
21# along with MacSyFinder (COPYING). #
22# If not, see <https://www.gnu.org/licenses/>. #
23#########################################################################
24
25
26import logging
27_log = logging.getLogger(__name__)
28from enum import Enum
29
30from .error import MacsypyError
31
32
33class GeneBank:
34 """
35 Store all Gene objects. Ensure that genes are instanciated only once.
36 """
37
38 def __init__(self):
39 self._genes_bank = {}
40
41 def __getitem__(self, key):
42 """
43 :param key: The key to retrieve a gene.
44 The key is composed of the name of models family and the gene name.
45 for instance CRISPR-Cas/cas9_TypeIIB ('CRISPR-Cas' , 'cas9_TypeIIB') or
46 TXSS/T6SS_tssH ('TXSS', 'T6SS_tssH')
47 :type key: tuple (string, string)
48 :return: return the Gene corresponding to the key.
49 :rtype: :class:`macsypy.gene.CoreGene` object
50 :raise KeyError: if the key does not exist in GeneBank.
51 """
52 try:
53 return self._genes_bank[key]
54 except KeyError:
55 raise KeyError(f"No such gene '{key}' in this bank")
56
57
58 def __len__(self):
59 return len(self._genes_bank)
60
61
62 def __contains__(self, gene):
63 """
64 Implement the membership test operator
65
66 :param gene: the gene to test
67 :type gene: :class:`macsypy.gene.CoreGene` object
68 :return: True if the gene is in, False otherwise
69 :rtype: boolean
70 """
71 return gene in set(self._genes_bank.values())
72
73
74 def __iter__(self):
75 """
76 Return an iterator object on the genes contained in the bank
77 """
78 return iter(self._genes_bank.values())
79
80
81 def genes_fqn(self):
82 """
83 :return: the fully qualified name for all genes in the bank
84 :rtype: str
85 """
86 return [f"{fam}/{gen_nam}" for fam, gen_nam in self._genes_bank.keys()]
87
88
89 def add_new_gene(self, model_location, name, profile_factory):
90 """
91 Create a gene and store it in the bank. If the same gene (same name) is add twice,
92 it is created only the first time.
93
94 :param model_location: the location where the model family can be found.
95 :type model_location: :class:`macsypy.registry.ModelLocation` object
96 :param name: the name of the gene to add
97 :type name: str
98 :param profile_factory: The Profile factory
99 :type profile_factory: :class:`profile.ProfileFactory` object.
100 """
101 key = (model_location.name, name)
102 if key not in self._genes_bank:
103 gene = CoreGene(model_location, name, profile_factory)
104 self._genes_bank[key] = gene
105
the unit test with unittest framework
1
2class Test(MacsyTest):
3
4 def setUp(self):
5 args = argparse.Namespace()
6 args.sequence_db = self.find_data("base", "test_1.fasta")
7 args.db_type = 'gembase'
8 args.models_dir = self.find_data('models')
9 args.res_search_dir = tempfile.gettempdir()
10 args.log_level = 30
11 self.cfg = Config(MacsyDefaults(), args)
12
13 self.model_name = 'foo'
14 self.model_location = ModelLocation(path=os.path.join(args.models_dir, self.model_name))
15 self.gene_bank = GeneBank()
16 self.profile_factory = ProfileFactory(self.cfg)
17
18 def tearDown(self):
19 try:
20 shutil.rmtree(self.cfg.working_dir)
21 except:
22 pass
23
24
25 def test_add_get_gene(self):
26 gene_name = 'sctJ_FLG'
27 with self.assertRaises(KeyError) as ctx:
28 self.gene_bank[f"foo/{gene_name}"]
29 self.assertEqual(str(ctx.exception),
30 f"\"No such gene 'foo/{gene_name}' in this bank\"")
31 model_foo = Model(self.model_name, 10)
32
33 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
34
35 gene_from_bank = self.gene_bank[(model_foo.family_name, gene_name)]
36 self.assertTrue(isinstance(gene_from_bank, CoreGene))
37 self.assertEqual(gene_from_bank.name, gene_name)
38 gbk_contains_before = list(self.gene_bank)
39 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
40 gbk_contains_after = list(self.gene_bank)
41 self.assertEqual(gbk_contains_before, gbk_contains_after)
42
43 gene_name = "bar"
44 with self.assertRaises(MacsypyError) as ctx:
45 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
46 self.assertEqual(str(ctx.exception),
47 f"'{self.model_name}/{gene_name}': No such profile")
48
49
50 def test_contains(self):
51 model_foo = Model("foo/bar", 10)
52 gene_name = 'sctJ_FLG'
53
54 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
55 gene_in = self.gene_bank[(model_foo.family_name, gene_name)]
56 self.assertIn(gene_in, self.gene_bank)
57
58 gene_name = 'abc'
59 c_gene_out = CoreGene(self.model_location, gene_name, self.profile_factory)
60 gene_out = ModelGene(c_gene_out, model_foo)
61 self.assertNotIn(gene_out, self.gene_bank)
62
63
64 def test_iter(self):
65 genes_names = ['sctJ_FLG', 'abc']
66 for g in genes_names:
67 self.gene_bank.add_new_gene(self.model_location, g, self.profile_factory)
68 self.assertListEqual([g.name for g in self.gene_bank],
69 genes_names)
70
71 def test_genes_fqn(self):
72 genes_names = ['sctJ_FLG', 'abc']
73 for g in genes_names:
74 self.gene_bank.add_new_gene(self.model_location, g, self.profile_factory)
75 self.assertSetEqual(set(self.gene_bank.genes_fqn()),
76 {f"{self.model_location.name}/{g.name}" for g in self.gene_bank})
77
78
79 def test_get_uniq_object(self):
80 gene_name = 'sctJ_FLG'
81 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
82 self.gene_bank.add_new_gene(self.model_location, gene_name, self.profile_factory)
83 self.assertEqual(len(self.gene_bank), 1)
7.4.2.1.2. A litle more complex example¶
The code to test
1
2import itertools
3import networkx as nx
4
5
6def find_best_solutions(systems):
7 """
8 Among the systems choose the combination of systems which does not share :class:`macsypy.hit.Hit`
9 and maximize the sum of systems scores
10
11 :param systems: the systems to analyse
12 :type systems: list of :class:`macsypy.system.System` object
13 :return: the list of list of systems which represent one best solution and the it's score
14 :rtype: tuple of 2 elements the best solution and it's score
15 ([[:class:`macsypy.system.System`, ...], [:class:`macsypy.system.System`, ...]], float score)
16 The inner list represent a best solution
17 """
18 def sort_cliques(clique):
19 """
20 sort cliques
21
22 - first by the sum of hits of systems composing the solution, most hits in first
23 - second by the number of systems, most system in first
24 - third by the average of wholeness of the systems
25 - and finally by hits position. This criteria is to produce predictable results
26 between two runs and to be testable (functional_test gembase)
27
28 :param clique: the solutions to sort
29 :type clique: List of :class:`macsypy.system.System` objects
30 :return: the clique ordered
31 """
32 l = []
33 for solution in clique:
34 hits_pos = {hit.position for syst in solution for hit in syst.hits}
35 hits_pos = sorted(list(hits_pos))
36 l.append((sorted(solution, key=lambda sys: sys.id), hits_pos))
37
38 sorted_cliques = sorted(l, key=lambda item: (sum([len(sys.hits) for sys in item[0]]),
39 len(item[0]),
40 item[1],
41 sum([sys.wholeness for sys in item[0]]) / len(item[0]),
42 '_'.join([sys.id for sys in item[0]])
43 ),
44 reverse=True)
45 sorted_cliques = [item[0] for item in sorted_cliques]
46 return sorted_cliques
47
48 G = nx.Graph()
49 # add nodes (vertices)
50 G.add_nodes_from(systems)
51 # let's create an edges between compatible nodes
52 for sys_i, sys_j in itertools.combinations(systems, 2):
53 if sys_i.is_compatible(sys_j):
54 G.add_edge(sys_i, sys_j)
55
56 cliques = nx.algorithms.clique.find_cliques(G)
57 max_score = None
58 max_cliques = []
59 for c in cliques:
60 current_score = sum([s.score for s in c])
61 if max_score is None or (current_score > max_score):
62 max_score = current_score
63 max_cliques = [c]
64 elif current_score == max_score:
65 max_cliques.append(c)
66 # sort the solutions (cliques)
67 solutions = sort_cliques(max_cliques)
68 return solutions, max_score
the unit test
1
2def _build_systems(cfg, profile_factory):
3 model_name = 'foo'
4 model_location = ModelLocation(path=os.path.join(cfg.models_dir()[0], model_name))
5 model_A = Model("foo/A", 10)
6 model_B = Model("foo/B", 10)
7 model_C = Model("foo/C", 10)
8 model_D = Model("foo/D", 10)
9 model_E = Model("foo/E", 10)
10 model_F = Model("foo/F", 10)
11 model_G = Model("foo/G", 10)
12 model_H = Model("foo/H", 10)
13 model_I = Model("foo/I", 10)
14 model_J = Model("foo/J", 10)
15 model_K = Model("foo/K", 10)
16
17 c_gene_sctn_flg = CoreGene(model_location, "sctN_FLG", profile_factory)
18 gene_sctn_flg = ModelGene(c_gene_sctn_flg, model_B)
19 c_gene_sctj_flg = CoreGene(model_location, "sctJ_FLG", profile_factory)
20 gene_sctj_flg = ModelGene(c_gene_sctj_flg, model_B)
21 c_gene_flgB = CoreGene(model_location, "flgB", profile_factory)
22 gene_flgB = ModelGene(c_gene_flgB, model_B)
23 c_gene_tadZ = CoreGene(model_location, "tadZ", profile_factory)
24 gene_tadZ = ModelGene(c_gene_tadZ, model_B)
25 systems = {}
26
27 systems['A'] = System(model_A, [c1, c2], cfg.redundancy_penalty()) # 5 hits
28 # we need to tweek the replicon_id to have stable ressults
29 # whatever the number of tests ran
30 # or the tests order
31 systems['A'].id = "replicon_id_A"
32 systems['B'] = System(model_B, [c3], cfg.redundancy_penalty()) # 3 hits
33 systems['B'].id = "replicon_id_B"
34 systems['C'] = System(model_C, [c4], cfg.redundancy_penalty()) # 4 hits
35 systems['C'].id = "replicon_id_C"
36 systems['D'] = System(model_D, [c5], cfg.redundancy_penalty()) # 2 hits
37 systems['D'].id = "replicon_id_D"
38 systems['E'] = System(model_E, [c6], cfg.redundancy_penalty()) # 1 hit
39 systems['E'].id = "replicon_id_E"
40 systems['F'] = System(model_F, [c7], cfg.redundancy_penalty()) # 1 hit
41 systems['F'].id = "replicon_id_F"
42 systems['G'] = System(model_G, [c4], cfg.redundancy_penalty()) # 4 hits
43 systems['G'].id = "replicon_id_G"
44 systems['H'] = System(model_H, [c5], cfg.redundancy_penalty()) # 2 hits
45 systems['H'].id = "replicon_id_H"
46 systems['I'] = System(model_I, [c8], cfg.redundancy_penalty()) # 2 hits
47 systems['I'].id = "replicon_id_I"
48 systems['J'] = System(model_J, [c9], cfg.redundancy_penalty()) # 2 hits
49 systems['J'].id = "replicon_id_J"
50 systems['K'] = System(model_K, [c10], cfg.redundancy_penalty()) # 2 hits
51 systems['K'].id = "replicon_id_K"
52
53 return systems
54
55 def test_find_best_solution(self):
56 systems = [self.systems[k] for k in 'ABCD']
57 sorted_syst = sorted(systems, key=lambda s: (- s.score, s.id))
58 # sorted_syst = [('replicon_id_C', 3.0), ('replicon_id_B', 2.0), ('replicon_id_A', 1.5), ('replicon_id_D', 1.5)]
59 # replicon_id_C ['hit_sctj_flg', 'hit_tadZ', 'hit_flgB', 'hit_gspd']
60 # replicon_id_B ['hit_sctj_flg', 'hit_tadZ', 'hit_flgB']
61 # replicon_id_A ['hit_sctj', 'hit_sctn', 'hit_gspd', 'hit_sctj', 'hit_sctn']
62 # replicon_id_D ['hit_abc', 'hit_sctn']
63 # C and D are compatible 4.5
64 # B and A are compatible 3.5
65 # B and D are compatible 3.5
66 # So the best Solution expected is C D 4.5
67 best_sol, score = find_best_solutions(sorted_syst)
68 expected_sol = [[self.systems[k] for k in 'CD']]
69 # The order of solutions are not relevant
70 # The order of systems in each solutions are not relevant
71 # transform list in set to compare them
72 best_sol = {frozenset(sol) for sol in best_sol}
73 expected_sol = {frozenset(sol) for sol in expected_sol}
74 self.assertEqual(score, 4.5)
75 self.assertSetEqual(best_sol, expected_sol)
76
77 systems = [self.systems[k] for k in 'ABC']
78 sorted_syst = sorted(systems, key=lambda s: (- s.score, s.id))
79 # sorted_syst = [('replicon_id_C', 3.0), ('replicon_id_B', 2.0), ('replicon_id_A', 1.5)]
80 # replicon_id_C ['hit_sctj_flg', 'hit_tadZ', 'hit_flgB', 'hit_gspd']
81 # replicon_id_B ['hit_sctj_flg', 'hit_tadZ', 'hit_flgB']
82 # replicon_id_A ['hit_sctj', 'hit_sctn', 'hit_gspd', 'hit_sctj', 'hit_sctn']
83 # C is alone 3.0
84 # B and A are compatible 3.5
85 # So the best Solution expected is B and A
86 best_sol, score = find_best_solutions(sorted_syst)
87 expected_sol = [[self.systems[k] for k in 'BA']]
88 best_sol = {frozenset(sol) for sol in best_sol}
89 expected_sol = {frozenset(sol) for sol in expected_sol}
90 self.assertEqual(score, 3.5)
91 self.assertSetEqual(best_sol, expected_sol)
92
7.4.2.1.3. Example with mock¶
In the unit test west each function, method in isolation from the rest of the code. Sometimes a function need to connect to a distant server, for instance a database or call an external software ...
In the example below the tested function _url_json take as argument an url of a github repository (following the github REST api) and analyse the response in json and transform it in python.
1 def mocked_requests_get(url):
2 class MockResponse:
3 def __init__(self, data, status_code):
4 self.data = io.BytesIO(bytes(data.encode("utf-8")))
5 self.status_code = status_code
6
7 def read(self, length=-1):
8 return self.data.read(length)
9
10 def __enter__(self):
11 return self
12
13 def __exit__(self, type, value, traceback):
14 return False
15
16 if url == 'https://test_url_json/':
17 resp = {'fake': ['json', 'response']}
18 return MockResponse(json.dumps(resp), 200)
19 elif url == 'https://test_url_json/limit':
20 raise urllib.error.HTTPError(url, 403, 'forbidden', None, None)
21 elif url == 'https://api.github.com/orgs/remote_exists_true':
22 resp = {'type': 'Organization'}
23 return MockResponse(json.dumps(resp), 200)
24 elif url == 'https://api.github.com/orgs/remote_exists_false':
25 raise urllib.error.HTTPError(url, 404, 'not found', None, None)
26 elif url == 'https://api.github.com/orgs/remote_exists_server_error':
27 raise urllib.error.HTTPError(url, 500, 'Server Error', None, None)
28 elif url == 'https://api.github.com/orgs/remote_exists_unexpected_error':
29 raise urllib.error.HTTPError(url, 204, 'No Content', None, None)
30 elif url == 'https://api.github.com/orgs/list_packages/repos':
31 resp = [{'name': 'model_1'}, {'name': 'model_2'}]
32 return MockResponse(json.dumps(resp), 200)
33 elif url == 'https://api.github.com/repos/list_package_vers/model_1/tags':
34 resp = [{'name': 'v_1'}, {'name': 'v_2'}]
35 return MockResponse(json.dumps(resp), 200)
36 elif url == 'https://api.github.com/repos/list_package_vers/model_2/tags':
37 raise urllib.error.HTTPError(url, 404, 'not found', None, None)
38 elif url == 'https://api.github.com/repos/list_package_vers/model_3/tags':
39 raise urllib.error.HTTPError(url, 500, 'Server Error', None, None)
40 elif 'https://api.github.com/repos/package_download/fake/tarball/1.0' in url:
41 return MockResponse('fake data ' * 2, 200)
42 elif url == 'https://api.github.com/repos/package_download/bad_pack/tarball/0.2':
43 raise urllib.error.HTTPError(url, 404, 'not found', None, None)
44 elif url == 'https://raw.githubusercontent.com/get_metadata/foo/0.0/metadata.yml':
45 data = yaml.dump({"maintainer": {"name": "moi"}})
46 return MockResponse(data, 200)
47 else:
48 raise RuntimeError("test non prevu", url)
49
50 @patch('urllib.request.urlopen', side_effect=mocked_requests_get)
51 def test_url_json(self, mock_urlopen):
52 rem_exists = package.RemoteModelIndex.remote_exists
53 package.RemoteModelIndex.remote_exists = lambda x: True
54 remote = package.RemoteModelIndex(org="nimportnaoik")
55 remote.cache = self.tmpdir
56 try:
57 j = remote._url_json("https://test_url_json/")
58 self.assertDictEqual(j, {'fake': ['json', 'response']})
59 finally:
60 package.RemoteModelIndex.remote_exists = rem_exists
61
62
63 @patch('urllib.request.urlopen', side_effect=mocked_requests_get)
64 def test_url_json_reach_limit(self, mock_urlopen):
65 rem_exists = package.RemoteModelIndex.remote_exists
66 package.RemoteModelIndex.remote_exists = lambda x: True
67 remote = package.RemoteModelIndex(org="nimportnaoik")
68 remote.cache = self.tmpdir
69 try:
70 with self.assertRaises(MacsyDataLimitError) as ctx:
71 remote._url_json("https://test_url_json/limit")
72 self.assertEqual(str(ctx.exception),
73 """You reach the maximum number of request per hour to github.
74Please wait before to try again.""")
75 finally:
76 package.RemoteModelIndex.remote_exists = rem_exists
77
7.4.2.2. Python example with pytest¶
The code to test
1
2def classic_levenshtein(string_1, string_2):
3 """
4 Calculates the Levenshtein distance between two strings.
5
6 This version is easier to read, but significantly slower than the version
7 below (up to several orders of magnitude). Useful for learning, less so
8 otherwise.
9
10 Usage::
11
12 >>> classic_levenshtein('kitten', 'sitting')
13 3
14 >>> classic_levenshtein('kitten', 'kitten')
15 0
16 >>> classic_levenshtein('', '')
17 0
18
19 """
20 len_1 = len(string_1)
21 len_2 = len(string_2)
22 cost = 0
23
24 if len_1 and len_2 and string_1[0] != string_2[0]:
25 cost = 1
26
27 if len_1 == 0:
28 return len_2
29 elif len_2 == 0:
30 return len_1
31 else:
32 return min(
33 classic_levenshtein(string_1[1:], string_2) + 1,
34 classic_levenshtein(string_1, string_2[1:]) + 1,
35 classic_levenshtein(string_1[1:], string_2[1:]) + cost,
36 )
37
38
39
40def wf_levenshtein(string_1, string_2):
41 """
42 Calculates the Levenshtein distance between two strings.
43
44 This version uses the Wagner-Fischer algorithm.
45
46 Usage::
47
48 >>> wf_levenshtein('kitten', 'sitting')
49 3
50 >>> wf_levenshtein('kitten', 'kitten')
51 0
52 >>> wf_levenshtein('', '')
53 0
54
55 """
56 len_1 = len(string_1) + 1
57 len_2 = len(string_2) + 1
58
59 d = [0] * (len_1 * len_2)
60
61 for i in range(len_1):
62 d[i] = i
63 for j in range(len_2):
64 d[j * len_1] = j
65
66 for j in range(1, len_2):
67 for i in range(1, len_1):
68 if string_1[i - 1] == string_2[j - 1]:
69 d[i + j * len_1] = d[i - 1 + (j - 1) * len_1]
70 else:
71 d[i + j * len_1] = min(
72 d[i - 1 + j * len_1] + 1, # deletion
73 d[i + (j - 1) * len_1] + 1, # insertion
74 d[i - 1 + (j - 1) * len_1] + 1, # substitution
75 )
76
77 return d[-1]
78
The test with pytest framework
1from bioconvert.core.levenshtein import wf_levenshtein, classic_levenshtein
2
3
4def test_wf_levenshtein():
5 levenshtein_tests(classic_levenshtein)
6
7
8def test_classic_levenshtein():
9 levenshtein_tests(wf_levenshtein)
10
11
12def levenshtein_tests(levenshtein):
13 assert 1 == levenshtein('kitten', 'kittenn')
14 assert 3 == levenshtein('kitten', 'sitting')
15 assert 0 == levenshtein('kitten', 'kitten')
16 assert 0 == levenshtein('', '')
17 assert 2 == levenshtein('sitting', 'sititng')
7.4.3. Functional tests¶
It's also easy to do functional tests with unittest (depending how you code your entrypoint)
1
2 @unittest.skipIf(not which('hmmsearch'), 'hmmsearch not found in PATH')
3 def test_gembase(self):
4 """
5
6 """
7 expected_result_dir = self.find_data("functional_test_gembase")
8 args = "--db-type=gembase " \
9 f"--models-dir={self.find_data('models')} " \
10 "--models TFF-SF all " \
11 "--out-dir={out_dir} " \
12 "--index-dir {out_dir} " \
13 f"--previous-run {expected_result_dir} " \
14 "--relative-path"
15
16 self._macsyfinder_run(args)
17 for file_name in (self.all_systems_tsv,
18 self.all_best_solutions,
19 self.best_solution,
20 self.summary):
21 with self.subTest(file_name=file_name):
22 expected_result = self.find_data(expected_result_dir, file_name)
23 get_results = os.path.join(self.out_dir, file_name)
24 self.assertTsvEqual(expected_result, get_results, comment="#", tsv_type=file_name)
25 expected_result = self.find_data(expected_result_dir, self.rejected_clusters)
26 get_results = os.path.join(self.out_dir, self.rejected_clusters)
27 self.assertFileEqual(expected_result, get_results, comment="#")
28
29
30 def test_only_loners(self):
31 expected_result_dir = self.find_data("functional_test_only_loners")
32 args = "--db-type ordered_replicon " \
33 "--replicon-topology linear " \
34 f"--models-dir {self.find_data('models')} " \
35 "-m test_loners MOB_cf_T5SS " \
36 "-o {out_dir} " \
37 "--index-dir {out_dir} " \
38 f"--previous-run {expected_result_dir} " \
39 "--relative-path"
40 self._macsyfinder_run(args)
41
42 for file_name in (self.all_systems_tsv,
43 self.all_best_solutions,
44 self.best_solution,
45 self.summary,
46 self.rejected_clusters):
47 with self.subTest(file_name=file_name):
48 expected_result = self.find_data(expected_result_dir, file_name)
49 get_results = os.path.join(self.out_dir, file_name)
50 self.assertFileEqual(expected_result, get_results, comment="#")
51
7.4.4. Coverage¶
To know if our test cover each condition in our code there exists frameworks which play the test and analyses which branch of code are tested or not. We call this operation a test coverage. In python coverage is a very efficient frameworks to do that.
coverage run --source macsypy tests/run_test.py -vv
coverage html
Below the summary output with the general coverage (96%) and the coverage for each python module

To know what part of the code is not covered just click on the module
In yellow appear the partial: It's a condition which is always True or always False
In red the code non covered by tests

A good testing set must cover more than 90% of the code.