PwC

Default view

Table

Name

Code Link

Description

Retrieved

Stars

/sclbd/ Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Open

https://github.com/sclbd/deepfakebench

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. Code: https://github.com/sclbd/deepfakebench

2024/05/07

293

/stanford-oval/ SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models

Open

https://github.com/stanford-oval/suql

This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Code: https://github.com/stanford-oval/suql

2024/05/07

126

/mlc-ai/ TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Open

https://github.com/mlc-ai/web-llm

Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives. Code: https://github.com/mlc-ai/web-llm

2024/05/07

9577

/efficient-large-model/ VILA: On Pre-training for Visual Language Models

Open

https://github.com/efficient-large-model/vila

Visual language models (VLMs) rapidly progressed with the recent success of large language models. Code: https://github.com/efficient-large-model/vila

2024/05/07

478

/cpacker/ MemGPT: Towards LLMs as Operating Systems

Open

https://github.com/cpacker/memgpt

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. Code: https://github.com/cpacker/memgpt

2024/05/04

9235

/ml-gsai/ MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

Open

https://github.com/ml-gsai/microdreamer

In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model. Code: https://github.com/ml-gsai/microdreamer

2024/05/04

46

/suzgunmirac/ Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Open

https://github.com/suzgunmirac/meta-prompting

This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks. Code: https://github.com/suzgunmirac/meta-prompting

2024/05/04

227

/jinhualiang/ WavCraft: Audio Editing and Generation with Natural Language Prompts

Open

https://github.com/jinhualiang/wavcraft

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Code: https://github.com/jinhualiang/wavcraft

2024/05/04

189

/runyiyang/ Spectrally Pruned Gaussian Fields with Neural Compensation

Open

https://github.com/runyiyang/sundae

However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. Code: https://github.com/runyiyang/sundae

2024/05/04

48

/nvlabs/ AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One

Open

https://github.com/nvlabs/radio

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. Code: https://github.com/nvlabs/radio

2024/05/04

176

/2471023025/ RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

Open

https://github.com/2471023025/ralm_survey

Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. Code: https://github.com/2471023025/ralm_survey

2024/05/04

76

/facebookresearch/ Lightplane: Highly-Scalable Components for Neural 3D Fields

Open

https://github.com/facebookresearch/lightplane

Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. Code: https://github.com/facebookresearch/lightplane

2024/05/04

126

/prometheus-eval/ Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Open

https://github.com/prometheus-eval/prometheus-eval

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. Code: https://github.com/prometheus-eval/prometheus-eval

2024/05/04

131

/hvision-nku/ StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Open

https://github.com/hvision-nku/storydiffusion

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation. Code: https://github.com/hvision-nku/storydiffusion

2024/05/04

394

/kindxiaoming/ KAN: Kolmogorov-Arnold Networks

Open

https://github.com/kindxiaoming/pykan

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). Code: https://github.com/kindxiaoming/pykan

2024/05/04

4923

/chenyangzhu1/ MultiBooth: Towards Generating All Your Concepts in an Image from Text

Open

https://github.com/chenyangzhu1/multibooth

MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. Code: https://github.com/chenyangzhu1/multibooth

2024/05/01

90

/tothebeginning/ PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Open

https://github.com/tothebeginning/pulid

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. Code: https://github.com/tothebeginning/pulid

2024/05/01

218

/FoundationVision/ Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Open

https://github.com/FoundationVision/Groma

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability. Code: https://github.com/FoundationVision/Groma

2024/05/01

260

/snap-stanford/ STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

Open

https://github.com/snap-stanford/stark

Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information. Code: https://github.com/snap-stanford/stark

2024/05/01

151

/MangoKiller/ MolTC: Towards Molecular Relational Modeling In Language Models

Open

https://github.com/MangoKiller/MolTC

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Code: https://github.com/MangoKiller/MolTC

2024/05/01

113

/magic-research/ PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Open

https://github.com/magic-research/PLLaVA

PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks. Code: https://github.com/magic-research/PLLaVA

2024/05/01

207

/zzxslp/ List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Open

https://github.com/zzxslp/som-llava

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. Code: https://github.com/zzxslp/som-llava

2024/04/28

41

/opengvlab/ How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Open

https://github.com/opengvlab/internvl

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code: https://github.com/opengvlab/internvl

2024/04/28

988

/ToruOwO/ Learning Visuotactile Skills with Two Multifingered Hands

Open

https://github.com/ToruOwO/hato

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. Code: https://github.com/ToruOwO/hato

2024/04/28

37

/JackAILab/ ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Open

https://github.com/JackAILab/ConsistentID

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Code: https://github.com/JackAILab/ConsistentID

2024/04/28

128

/microsoft/ Make Your LLM Fully Utilize the Context

Open

https://github.com/microsoft/FILM

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. Code: https://github.com/microsoft/FILM

2024/04/28

79

/dcharatan/ FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

Open

https://github.com/dcharatan/flowmap

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Code: https://github.com/dcharatan/flowmap

2024/04/28

382

/ericlbuehler/ X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design

Open

https://github.com/ericlbuehler/mistral.rs

Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. Code: https://github.com/ericlbuehler/mistral.rs

2024/04/28

506

/apple/ CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Open

https://github.com/apple/corenet

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. Code: https://github.com/apple/corenet

2024/04/28

5136

/lean-dojo/ Towards Large Language Models as Copilots for Theorem Proving in Lean

Open

https://github.com/lean-dojo/leancopilot

In this paper, we explore LLMs as copilots that assist humans in proving theorems. Code: https://github.com/lean-dojo/leancopilot

2024/04/25

777

/princeton-vl/ Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

Open

https://github.com/princeton-vl/multislam_diffpose

The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. Code: https://github.com/princeton-vl/multislam_diffpose

2024/04/25

31

/holmeswww/ AgentKit: Flow Engineering with Graphs, not Coding

Open

https://github.com/holmeswww/agentkit

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". Code: https://github.com/holmeswww/agentkit

2024/04/25

199

/ailab-cvc/ SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Open

https://github.com/ailab-cvc/seed-x

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications. Code: https://github.com/ailab-cvc/seed-x

2024/04/25

92

/Jyxarthur/ Moving Object Segmentation: All You Need Is SAM (and Flow)

Open

https://github.com/Jyxarthur/flowsam

The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. Code: https://github.com/Jyxarthur/flowsam

2024/04/25

148

/fasterdecoding/ SnapKV: LLM Knows What You are Looking for Before Generation

Open

https://github.com/fasterdecoding/snapkv

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens. Code: https://github.com/fasterdecoding/snapkv

2024/04/25

59

/ez-hwh/ AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

Open

https://github.com/ez-hwh/autocrawler

We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Code: https://github.com/ez-hwh/autocrawler

2024/04/25

147

/hiyouga/ Dynamic Generation of Personalities with Large Language Models

Open

https://github.com/hiyouga/llama-factory

We propose a new metric to assess personality generation capability based on this evaluation method. Code: https://github.com/hiyouga/llama-factory

2024/04/25

19582

/id-animator/ ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Open

https://github.com/id-animator/id-animator

Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Code: https://github.com/id-animator/id-animator

2024/04/25

73

/yisol/ Improving Diffusion Models for Virtual Try-on

Open

https://github.com/yisol/IDM-VTON

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Code: https://github.com/yisol/IDM-VTON

2024/04/25

407

/apple/ OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Open

https://github.com/apple/corenet

To this end, we release OpenELM, a state-of-the-art open language model. Code: https://github.com/apple/corenet

2024/04/25

1747

/Infini-AI-Lab/ TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Open

https://github.com/Infini-AI-Lab/TriForce

However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Code: https://github.com/Infini-AI-Lab/TriForce

2024/04/20

34

/microsoft/ RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems

Open

https://github.com/microsoft/recai

This paper introduces RecAI, a practical toolkit designed to augment or even revolutionize recommender systems with the advanced capabilities of Large Language Models (LLMs). Code: https://github.com/microsoft/recai

2024/04/20

254

/FaceOnLive/ Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Open

https://github.com/FaceOnLive/Face-Liveness-Detection-SDK-Linux

SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Code: https://github.com/FaceOnLive/Face-Liveness-Detection-SDK-Linux

2024/04/19

184

/liming-ai/ ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Open

https://github.com/liming-ai/ControlNet_Plus_Plus

To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Code: https://github.com/liming-ai/ControlNet_Plus_Plus

2024/04/19

55

/Recognito-Vision/ Unrecognizable Yet Identifiable: Image Distortion with Preserved Embeddings

Open

https://github.com/Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition

In the realm of security applications, biometric authentication systems play a crucial role, yet one often encounters challenges concerning privacy and security while developing one. Code: https://github.com/Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition

2024/04/19

110

/facebookresearch/ LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Open

https://github.com/facebookresearch/llm-transparency-tool

We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Code: https://github.com/facebookresearch/llm-transparency-tool

2024/04/19

260

/facebookresearch/ Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Open

https://github.com/facebookresearch/generative-recommenders

Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Code: https://github.com/facebookresearch/generative-recommenders

2024/04/19

80

/Beomi/ Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Open

https://github.com/Beomi/InfiniTransformer

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. Code: https://github.com/Beomi/InfiniTransformer

2024/04/18

87

/Recognito-Vision/ LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Open

https://github.com/Recognito-Vision/Face-SDK-Linux-Demos

This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. Code: https://github.com/Recognito-Vision/Face-SDK-Linux-Demos

2024/04/18

81

/siyan-zhao/ Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

Open

https://github.com/siyan-zhao/prepacking

In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length. Code: https://github.com/siyan-zhao/prepacking

2024/04/18

27

TOP