There are two sources of data on the UK’s companies and sectors, but neither are quite fit for purpose. That’s the view of Alex Craven, Co-Founder and Director of The Data City.
As part of the Leeds-based company, Craven and a team of 10 have introduced machine learning to the categorisation of UK businesses in an effort to keep up with the pace of change.
“I assumed the government would know what our businesses do, but I was completely wrong,” he told Prolific North.
Craven explained that one source of business data, Companies House, categorises businesses by a Standard Industrial Classification or SIC code, which doesn’t allow for firms in emerging industries to properly identify themselves.
It’s a problem he experienced first-hand.
In 2014, then founder of Leeds agency Bloom, Craven opened an industry report on UK agencies only to find his company was not there, despite being a top five agency in the city.
He explained that the SIC code for ‘IT and Computing’ was separate to the ‘Marketing and Advertising’ code in which Bloom was categorised, and theorised that this had led to the agency’s absence from the report.
The existential quandary faced by emerging businesses is not limited to agencies, he said. Emerging sectors such as FinTech could include everything from big banks to tech startups.
“Are the training providers part of the sector or the recruitment agents which specialise in a sector part of the sector? What about the consultancies?,” he asked.
“Fundamentally the government doesn’t have any data on the emerging economy, which is a bit shocking,” he said.
The second place to find and categorise UK companies is via search engine, but, Craven added, these results can’t be downloaded and analysed. Further, the first result is often the company with the highest SEO spend, rather than the most relevant.
To combat this, The Data City is creating an alternative to SIC codes which it calls an RTIC, or Real Time Industry Classification. The company’s mission statement, he said, was “to become the new standard in industry data for the emerging economy.”
The Data City’s offering is two-fold. Firstly, it has matched the information from over a million companies via Companies House with the relevant company website.
Secondly, it has introduced machine learning, guided by sector experts, to help scan the data from those websites and better classify what a company really does.
Every month, the websites are rescanned and the data is updated, and annually sectors are reclassified to ensure it remains on top of those which have matured.
It is currently working to define sectors such as advanced manufacturing, clean energy, cybersecurity, AI, quantum computing, and 5G infrastructure.
A software interface enables its users – which include policy makers, economists and investors a well as the government – to build a list of companies not by industry or keyword, but by their similarity to other companies.
“Because machine learning doesn’t have any bias, it just takes the text from those websites and looks for others that do similar things,” he explained.
“it finds companies that the experts weren’t aware of. By adding a company to the data-set, you might find that the company is not on its own, and there are other companies that do the same thing.”
Expansion plans
The company has opened an office in Holland as part of its European expansion, and is looking to North and South America as well as Asia on its roadmap.
He said towards the end of the year the currently bootstrapped firm will look for funding to “repeat our mission statement globally”, though solving a country’s business data could look very different overseas.
“In America, company financial information is very hard to get because there’s no equivalent of Companies House and you’re not obligated to publish your accounts unless you are a listed business” he said.