This month, large models are even more in the news than last month: the open source Bloom model is almost finished, Google’s LaMDA is good enough that it can trick people into thinking it’s sentient, and DALL-E has gotten even better at drawing what you ask.
The most important issue facing technology might now be the protection of privacy. While that’s not a new concern, it’s a concern that most computer users have been willing to ignore, and that most technology companies have been willing to let them ignore. New state laws that criminalize having abortions out of state and the stockpiling of location information by antiabortion groups have made privacy an issue that can’t be ignored.
Learn faster. Dig deeper. See farther.
- Big Science has almost finished training its open source BLOOM language model, which was developed by volunteer researchers and trained using public funds. Bloom will provide an open, public platform for research into the capabilities of large language models and, specifically, issues like avoiding bias and toxic language.
- AI tools like AlphaFold2 can create new proteins, not just analyze existing ones; the unexpected creation of new artifacts by an AI system is playfully called “hallucination.” The proteins designed so far probably aren’t useful; still, this is a major step forward in drug design.
- Microsoft is limiting or removing access to some features in its face recognition service, Azure Face. Organizations will have to tell Microsoft how and why facial recognition will be used in their systems; and services like emotion recognition will be removed completely.
- Amazon plans to give Alexa the ability to imitate anyone’s voice, using under a minute of audio. They give the example of a (possibly dead) grandmother “reading” a book to a child. Other AI vendors (most notably OpenAI/Microsoft) have considered such mimicry unethical.
- Dolt is a SQL database that lets you version data using git commands, You can clone, push, pull, fork, branch, and merge just as with git; you access data using standard SQL.
- It’s sadly unsurprising that a robot incorporating a widely-used neural network (OpenAI CLIP) learns racist and sexist biases, and that these biases affect its performance on tasks.
- Building autonomous vehicles with memory, so that they can learn about objects on the routes they drive, may be an important step in making AV practical. In real life, most people drive over routes they are already familiar with. Autonomous vehicles should have the same advantage.
- The argument about whether Google’s LaMDA is “sentient” continues, with a Google engineer placed on administrative leave for publishing transcripts of conversations that he claimed demonstrate sentience. Or are large language models just squirrels?
- For artists working in collaboration with AI, the possibilities and imperfections of AI are a means of extending their creativity.
- Pete Warden’s proposal for ML Sensors could make developing embedded ML systems much simpler: push the machine learning into the sensors themselves.
- Researchers using DALL-E 2 discovered that the model has a “secret vocabulary” that’s not human language, but that can be used somewhat reliably to create consistent pictures. It may be an artifact of the model’s inability to say “I didn’t understand that”; given nonsense input, it is pulled towards similar words in the training corpus.
- HuggingFace has made an agreement with Microsoft that will allow Azure customers to run HuggingFace language models on the Azure platform.
- The startup Predibase has built a declarative low-code platform for building AI systems. In a declarative system, you describe the outcome you want, rather than the process for creating the outcome. The system figures out the process.
- Researchers are developing AI models that implement metamemory: the ability to remember whether or not you know something.
- As the population ages, it will be more important to diagnose diseases like Alzheimer’s early, when treatment is still meaningful. AI is providing tools to help doctors analyze MRI images more accurately than humans. These tools don’t attempt diagnosis; they provide data about brain features.
- Google has banned the training of Deepfakes on Colab, its free Jupyter-based cloud programming platform.
- Samsung and RedHat are working on new memory architectures and device drivers that will be adequate to the demands of a 3D-enabled, cloud-based metaverse.
- The Metaverse Standards Forum is a new industry group with the goal of solving interoperability problems for the Metaverse. It views the Metaverse as the outgrowth of the Web, and plans to coordinate work between existing standards groups (like the W3C) relevant to the Metaverse.
- Can the “Open Metaverse” be the future of the Internet? The Open Metaverse Interoperability Group is building vendor-independent standards for social graphs, identities, and other elements of a Metaverse.
- Holographic heads-up displays allow for 3D augmented reality: the ability to project 3D images onto the real world (for example, onto a car’s windshield).
- Google’s Visual Position Service uses the data they’ve collected through Street View to provide high-accuracy positioning data for augmented reality applications. (This may be related to Niantic’s VPS, or they may just be using the same acronym.)
- With the end of Roe v. Wade, personal data, including search histories and location data, could be used to prosecute women who have abortions. Data brokers already collect and sell this data. It is unclear how large Internet companies that also collect this data will respond. (Google has announced that they will delete location histories that include visits to sensitive locations.)
- Security researchers have identified over 900,000 Kubernetes clusters that are exposed (and possibly vulnerable) to malicious scans. 65% of them are in the US.
- Sonatype has discovered a number of modules in the Python’s PyPI repository that steal AWS credentials and other important data. Supply chain security will continue to be a problem for developers, regardless of the programming language or problem domain.
- Microsoft’s analysis of Russia’s cyberwar efforts show that they have increasingly attacked resources in countries allied with Ukraine (most notably the US), and that government computers that are on-premises are especially vulnerable.
- Working with Fastly and Cloudflare, Apple has developed a service called Automatic Verification that eliminates the need for Captchas. According to rumors, it will be enabled by default in the beta of iOS16.
- A surprisingly small botnet (only 5,000 hosts) generated a record-setting DDOS attack that peaked at 26M HTTPS requests per second. The botnet was so powerful because most of its devices belonged to cloud providers. Cloudflare’s free service was able to mitigate the attack.
- A different kind of attack against neural networks: present them with inputs that drive worst-case energy consumption, forcing processors to reduce their clock speed or even overheat.
- A new attack called Hertzbleed uses small variations in a processor’s clock speed while it is processing encryption keys to guess those keys. Intel and AMD CPUs are vulnerable. While this attack may never be seen in the wild, it shows how the complexity of modern processors creates vulnerabilities.
- Symbiote is a new kind of malware that attacks Linux, injects software into all running processes, and uses Berkeley packet filters (eBPF) to steal data and create covert communications channels. Symbiote uses dynamic linker hijacking to link executables to modified system libraries at run time.
- In the first quarter of 2022, the number of known ransomware attacks was down 40%, largely due to the disappearance of the Conti ransomware group. This drop is probably only temporary. Tactics also changed; attackers aren’t announcing the names of their victims publicly, preferring to negotiate a ransom privately.
- Amazon has launched CodeWhisperer, a direct competitor to GitHub Copilot.
- Linus Torvalds predicts that Rust will be used in the Linux kernel by 2023.
- GitHub Copilot is now generally available (for a price); it’s free to students and open source maintainers. Corporate licenses will be available later this year.
- WebAssembly is making inroads. The universal WebAssembly runtime, Wasmer, runs any code, on any platform. Impressive, if it delivers.
- Can WebAssembly replace Docker? Maybe, in some applications. WASM provides portability and eliminates some security issues (possibly introducing its own); Docker sets up environments.
- Mozilla’s Project Bergamot is an automated translation tool designed for use on the Web. It can be used to build multilingual forms and other web pages. Unlike most other AI technologies, Bergamot runs in the browser using WASM. No data is sent to the cloud.
- Microsoft has released a framework called Fluid for building collaborative apps, such as Slack, Discord, and Teams. Microsoft will also be releasing Azure Fluid Relay to support Fluid-based applications.
- Dragonfly is a new in-memory database that claims significantly faster performance than memcached and Redis.
- The Chinese government has blocked access to open source code on Gitee, the Chinese equivalent to GitHub, saying that all code must be reviewed by the government before it can be released to the public.
- Is Blockchain Decentralized? A study commissioned by DARPA investigates whether a blockchain is truly immutable, or whether it can be modified without exploiting cryptographic vulnerabilities, but by attacking the blockchain’s implementation, networking, and consensus protocols. This is the most comprehensive examination of blockchain security that we’ve seen.
- Jack Dorsey has announced that he’s working on Web5, which will be focused on identity management and be based on Bitcoin.
- Molly White’s post questioning the possibility of acceptably non-dystopian self-sovereign identity is a must-read; she has an excellent summary and critique of just about all the work going on in the field.
- Cryptographer Matthew Green makes an important argument for the technologies behind cryptocurrency (though not for the current implementations).
- Probabilistic computers, built from probabilistic bits (p-bits), may provide a significant step forward for probabilistic decision making. This sounds esoteric, but it’s essentially what we’re asking AI systems to do. P-bits may also be able to simulate q-bits and quantum computing.
- A system that links two time crystals could be the basis for a new form of quantum computing. Time crystals can exist at room temperature, and remain coherent for much longer than existing qubit technologies.